DOCOMEUT RESDHE 



ED 072 228 

AUTHOR 
TITLE 

INSTITUTION 

PUB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



VT 018 606 

Creager^ John A. 

Noneccxtomic Analysis Considerations for Management 
and Information System for Occupational Education • 
Hanf^gement and Information System for Occupational 
Education, Winchester, Mass. 
15 Jun 72 

136p. ; Occasional Paper-7 
MP-<0*65 HC-$6*58 

Algorithms; Design Preferences; Economic Research; 
Educational Planning; Information Retrieval; 
^Management Information Systems; Man Machine Systems; 
♦Mathematical Models; ♦Operations Research; 
Simulation; ♦statistical Analysis; Systems 
Development; ♦Vocational Education 
♦Management Infoxrmation System Occupational Educa; 
Massachusetts ; MISOE 



As the first of two papers delineating the design of 
Massachusetts* Management and Information System for Occupational 
Education (MISOE), these specific dimensions of MISOE structure and 
function are considered: (1) the distinction between economic and 
noneconomic analysis, (2) distinctions among census, sample, and 
other data, (3) the distinction between descriptive and simulative 
analysis, and (4) functional levels, management levels, and 
management scope» Information retrieval and analysis for MISOE 
necessitates: (1) translation of inquiries into analytic hypotheses, 
(2) the selection of pertinent MISOE subsystems, data types and 
levels, analytical operations, and models, (3) performing the 
analyses and interpreting their results, and (4) reporting the 
results to the inquiry source* Discussions of general analysis 
requirements and considerations precede the detailing of specific 
analytical models and algorithms for MISOE, such as multiple linear 
regression and factor analysis* Dynamic simulation, linear 
programing, and nonlinear programing models are discussed, in 
addition to specific noneconomic analysis factors to consider within 
and among MISOE' s subsystems of static space. Technical reports on 
MISOE 's research methodology are appended » Related documents are 
available in this issue as VT 018 600, VT 018 602, VT 018 809, and VT 
018 810. (AG) 



ERLC 



FILMED FROM BEST AVAILABLE COPY 



CO 

; ro 
o 



OCCASIONAL PAPER #7 



U.S. DEPARTMENT OF HEALTH. 
EDUCATION & WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO 
DUCEO EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG 
INATING IT POINTS Of VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY . 



MONECONOMIC ANALYSIS CONSIDERATIONS FOR MISOE 



By 



John A, Creager 



June 15, 1972 

CP 

o 
h 

Management and Informatfon System for Occupational Education 
1017 Main Street, Winchester, Massachusetts 01890 
Telephone: 617-729-9260 



ERIC 



Preface 



This paper is one of a series prepared by the staff and a team of 
consultants to delineate and document the design of the Management and In- 
formation System for Occupational Education. It is the first of two such 
papers by the author, submitted as the formal response to staff inquiries, 
and as major tangible products of the consultation relationship. Gratitude 
is expressed to the staff for its extensive help, in documentation and in 
conferences. Although all reasonable effort has been made to be both rele- 
vant and accurate, the author disclaims infa liability, and encourages the staff 
to be both selective and flexible in its use of the aids offered. 



TABLE OF CONTENTS 



PART II. 



PART I. General Definition of Context, Scope, and Depth of Analysis 

Considerations ^ ^ 

A. The Context ^ 

B. General Analysis Requirements 2 

C. "Dimensions" for Analysis Requirements 3 

General Analysis Considerations...,., • 5 

Sources and Control of Inferential Errors 5 

General Software Considerations. 9 

Missing Data Problems 

Formulation of ''Mixes" ^ .^3 

Followup Problems 

Implications of "Initial Data Points" and Cohort 

-Replacement 

* • 16 

Some Functional Issues in Data Processing and the 
Computer Facility 

PART III. The Analysis Repertoire of MISOE: Models and Algorithms 20 

Multiple Linear Regression. ..." 20 

Relative Contributions in Regression 23 

The Control of Process-Product and Process^Impact Analyses 
for Differential Input 28 

Principal Components and Factor Analysis 29 

Canonical and Discriminant Analysis 30 

Temporal Analysis ^2 

Miscellaneous Analysis Tools of Moderate to Lower 
Priority 



PART IV. 



Noneconomic Analysis Considerations within and Among Subsystems 

36 

Introduction 

36 

The Range Restriction Problem 3g 

Analysis Considerations for Noneconomic Factors in the 
Process Space 

Analysis Involving the Product Space 45 

Analysis Involving the Iin>aci: Space...- 51 

The Educational Human Input atid Student Spaces 55 

Analysis Across IPPI Spaces gy 



TABLE OF CONTENTS 
(Continued) 



PART V. Simulation Models 59 

Introduction 59 

General Consideration of Dynamic Simulation 62 

Equations and Data Sources 64 

Inferential Errors in Dynamic Simulation 69 

A Pseudodynamic Model as Nonlinear Programming 73 

Figure I. Pseudodynamic Model for Process-Product Inquiry 75 

A More Rational Approach 77 

A Linear Programming Solution 80 

PART VI . Epi logue 83 

APPENDIXES Appendix A Technical Reports for MISOE I 

Appendix B Memoranda for MISOE x| 



NONECONOMIC ANALYSIS CONSIDERATIONS FOR MXSOE 
John A. Creager 

Part I. General Definition of Context, Scope, and Depth of Analysis Considerations 
A. The Context 

The general purposes, structures, and functions of MISOE have been 
delineated in Monograph No. 1, Occasional Papers 1-6, and in a position paper. 
Although MISOE has primary reference to occupational education in the State of 
Massachusetts, it is recognized that occupational education is imbedded in the 
general state system of education, which in turn is imbedded in the still more 
general system of state concerns for realizing societal values. Moreover, 
MISOE is to be prototypical, i.e., a paradigm for other management and infor- 
mation systems, and therefore, a contribution to management technology as well 
as a practical management tool for occupational education in Massachusetts. 

In the anticipated typical usage of MISOE, an inquiry will be initiated 
at some management level and translated into a problem for Information retrieval 
and analysis. The resulting information and its implications by interpretation 
will then be fed back to the source of inquiry as a system response. Where 
the initial inquiry is complex, it may be fractionated into subinquiries, each 
of which demand information retrieval and analysis, with their implications 
integrated into the MISOE response to the inquiry source. An inquiry (or 
subinquiry)' Implies: 

1. translation into analytic hypotheses 

2. selection of the relevant MISOE subsystems (spaces and elements) 

3. selection of the relevant data types and levels 

4; selection of the relevant models and analytical operations 
5.. performance of the analyses 

6. interpretation of analysis results 

7. reporting of interpreted results to inquiry source. 



An Inquiry may involve economic information (e^g., cost:s, time, etc.)> 
noneconomic information (quality and quantity of manpower, educational 
programs, etcl), or both. Occasional Papers No. 7 and No. 9 are addressed to 
considerations of the noneconomic analysis aspects of MISOE, while Occasional 
Paper No. 8 is addressed to the consideration of economic analysis. Because 
complex inquiries, especially those received from the state management level, 
are likely to involve both economic and noneconomic aspects in relation to 
each other, more intensive attention must be given in further MISOE develop- 
ment to, integrating these aspects in analysis. This paper assumes that such 
interrelations will occur analytically in many of the simulation analyses 
rather than in the descriptive (nonsimulation) analyses, but it is recognized 
that some of the economic-noneconomic relationships will be descriptive in- 
formation useful in formulating auxiliary equations that moderate flow rates 
in dynamic space. Although this paper is focused on noneconomic analysis 
considerations, some further consideration of these matters will be made in a 
later section (in Part FIVE) on the interface and communication between des- 
criptive and simulation analysis. 
B. General Analysis Requirements 

The general requirements for the analytic aspects of MISOE must recognize 
the demands that MISOE have: 

1. generality in terms of level and scope of inquiry, 

2. flexibility in terms of changes in educational programs, available 
technology, variations in inquiry types, 

3. expandability in terms of new programs, new issues and inquiries, and new 
data types, and enlarged capability to simulate finer aspects of "reality" 

4. continuity of operational capability regardless of personnel or other 
changes, and 

5. sensitivity to the needs of potential inquirers of the system. 



3 



Moreover, the analytic aspects of MISOE must interface and interconmunicate 
all aspects of MISOE structures and functions. Thus, the analytic capabilities 
contribute: 

. 1. to generality by MISOE having in its repertoire general models and 
computer programs, 

2.. to flexibility by having a broad repertoire of such models and programs 
with many specific options, 

3. to expandability by being aware of analytical tools not immediately 
required, but of potential value as MISOE expands, 

4. to continuity by having thorough documentation and referencing of all 
models, computer programs, and analyses actually performed, and 

5. to sensitivity to potential user needs by the scope and depth of the 
MISOE analysis repertoire. 

C. "Dimensions" for Analysis Requirements 

This section defines some major "dimensions" of MISbE structure and func- 
tion. Analytic considerations which are general and therefore cut across such 
dimensions are discussed in Parts Two and Three. Part Two discusses those topics 
which cut across the models and algorithms, which constitute the analysis rep- 
ertoire and which are discussed in Part Three. The present discussion of 
"dimensions" defines the categories on each dimension and how they are treated. 

The first dimension is the distinction between economic and noneconomic 
analysis requirements and was discussed above. In terms of the analysis types 
stated in Occasional Paper No. 1, the emphasis in this paper on noneconomic 
analysis implies emphasis on A2 (proc ess -product) , A4 (product-impact), and A5 
(process-impact) . 

The second dimension distinguishes census , sample , and other data. This 
paper assumes that all analysis above the entry level (and some of it there) 



ERIC 



4 

is concerned with the census populacions, or subcategories thereof, and Jhat 
sample data will be appropriately weighted to be representative of those pop- 
ulations. The requirements to ensure that this is so for all data" types and 
analysis levels will be delineated in Occasional Paper No. 12, which will 
specify sampling and weighting procedures. Data from external sources, e.g., 
U.S. Census, Project CAREER, Project TALENT, the Cooperative Institutional 
Research Program (CIRP) of the American Council on Education (ACE), will be 
only partially connectable for comparative and normative information at analysis 
levels 1 and 2 (see Occasional Paper No.-. 2), 

The distinction between descriptive and simulative analysis constitutes a 
third dimension; Descriptive analysis includes the estimation of population 
parameters from sample data (discussed for entry level analysis in Occasional 
Paper No. 12), distributional statistics and correlational analysis both 
univariate and multivariate, and special capabilities for discrimination, data 
transformation, and taxonomy. Simulative analysis includes both static 
simulation, which may be required by certain types of inquiry, and dynamic 
simulation of the Forrester type. Analysis considerations will be discussed 
both within and between the descriptive and simulative categories. 

The discussion of descriptive analytic considerations will also be struc- 
tured in terms of the spaces and elements of MISOE delineated in Occasional 
Papers No. 1, No. 2, and No. 4, as expanded and modified in Occasional Paper 
No. 5. Part Four discusses analysis within and across such MISOE subsystems. 
These discussions will take cognizance of the dimension of functional levels 
(educational sectors such as secondary, adult, MDTj programs, blocks, siid units), 
the dimension of management level (state, region, or local), and the dimensioa 
of management scope (social agencies, all education, occupational education 
across programs, and occupational education within programs). Generally, 



5 



however, it is assumed that beyond the entry level, where the information storage 
and retrieval system (Occasional Paper No. 3) identifies such information, and 
where aggregation will be accomplished, the nature of a particular inquiry will 
Ijnply the functional level and analysis units without need for major operating 
decisions in connection with choices of models and other analysis tools. 



Part II: General Analysis Considerations 

This part considers analysis topics which are not specific to particular 
inquiries or their associated relevant mISOE subsystems. Thus, they are likely 
to be involved in anj: analytic operations to some degree. Included are concerns 
about analysis tools and ofher such topics as sources and control of inferential 
errors, computer software, and special probleir,3 involving "mixes", cohort re- 
placement, and followups. General analysis considerations also include the 
selection of variables and data instruments, a topic which Will not be discussed 
here because it is the subject of Occasional paper No. 10. 
Sources and Control of Inferential Errors 

The general utility of MISOE is to be that of a management tool for the 
appraisal of existing policies and as a guide to decision-making and policy 
change at all management levels. A system of this complexity contains a number 
of hazards that may arise anytime during operation from initial inquiry to the 
final interpretf;d response of the system. This section is concerned with such 
hazards that arise in analysis operations and which may distort the results of 
analysis in such a way that a false or misleading picture of "reality" is in- 
ferred (inferential error ). The recommendations to management may then be 
faulty and even dangerous, either to individuals in the state system under 
study, to programs, to the managers who believe and act on the recommend^itions 



from MISOE, or ultimately, to MISOE itself. In addition to the obvious issue 
of professional ethics involved, the credibility and therefore the survival 
of MISOE is at stake. 

The major sources of inferential error in analysis arise from sampling 
. error, measurement error, and processing error. Errors may be either random or 
nonrandom in either their occurrence, or in their analytic consequences. Much 
of the classical literature in psychometrics and the other behavioral sciences 
has limited applicability in complex programs using data obtained from diversi- 
fied sources and over extended periods of time. 

The sources and control of sampling error will be discussed in somewhat 
greater depth in Occasional Paper No. 12. Suffice it to say here that: 

1. random sampling errors are controlled by sample size and by stratifi- 
cation; 

2. nonrandom sampling errors are controlled partly by the logistics of 
data collection, and partly by stratification; 

3. the e-.fects of both kinds of error are partially controllable by 
suitable procedures for weighting data to make them more nearly 
representative of the populations of interest; 

4. bias from nonrandom sampling, including that from nonresponse to mail 
or phone contact of subjects, fs a much more serious concern, and more 
difficult to control, than the effects of random sampling fluctuations. 

Concern about the reliability of measurements, i.e., the consistency of 
data obtained on replicated measurement, pervades the classical literature in 
psychometrics. It is usually not difficult to obtain good estimates of relia- 
bility of instruments in common use and of good repute, especially for the 
standard tests of ability and achievement. For the most part such instruments 
have sufficiently high reliability to ensure the usef^lness of the Listrument, 
and in case/: where coefficients are not very high, special procedures are 



available to reduce the likelihood of inferential error,, In Inrge licale pro- 
grams (TALENT, CIRP of ACE, and MISOE) , where many different ?:inds of measure- 
ments may enter a particular analysis, these measures having varj'ing degrees 
of reliability, and where the reliabilities may also vary w^-th respect to 
various gioups of subjects, the risks of inferential error? from measurement 
error may interact with one another in such a manner as tc render the results 
quite uninterpretable. In the case of categorical variables used to define 
subgroups of the population under -study, measurement error leads to misclassi- 
fications. Relations between such variables and other variables are net neces- 
sarily attentuated but may be spuriously inflated. In regression analysis, 
different reliabilities across the variables may not o^ily disturb, the' relative 
size of regression weights, but may even reverse their relative order of magni- 
tude. This can be serious where such weights are later used as auxiliary 
modifiers of rates in dynamic simulation, or used in rendering some judgement 
about the relative importance of the regression variables in prediction, or 
used in accounting for variance in a dependent variable. The ACE Research 
Report, Measurement Error in Social and Educational Survey Research , deals with 
many of the issues raised here, and should be reganled for KISOE development 
purposes as an appendix to this Occasional Paper, topics discussed include: 

1. the meaning of measurement error, 

2. the effects of error un analysis and interpretation, 

3. a review of the pertinent literature (and a rather extensive biblio- 
graphy) , 

4. concern about sources of error in different item types, formats, and 
contents, 

5. error in those initial data-processing operations which have the same 
effects in analysis as measurement error, 

6. empirical data on reliability of a variety of questionnaire survey 



O item types. 

ERIC 



8 



Table 3 in Occasional Paper No. 5 shows staff concern for the reliability 
of Instruments. Unless there is some sole available measure of a very important 
variable, it should not be necessary for MISOE to deal with any continuous 
variable with internal consistency or retest r^^ less than .75, even in the 
personality and attitude domains. Coinrelatlons involving continuous variables 
with r^^ in the .75-. 95 range should be corrected for attenuation. This cor- 
rection can be ignored when r^^ is greater than .95. Although somewhat ar- 
bitary, and set a little higher than necessary for exploratory research purposes, 
these outpoints and associated recommendations are made to reduce the risk of 
letting measurement error have undue influence on outcomes from descriptive 
analysis upon parameters in dynamic simulation. In the case of dichotomous 
categorical variables, phi coefficients used as measures of reliability are a 
function of the base rate or popularity of the item and their values are con- 
strained to a range of less than 0-1. When used to judge the reliability of the 
variable, phi should be divided by the maximum value that Phi can have for 
the associated base rate. However, point-biserial phi coefficients as measures 
of correlations between two variables should not be corrected for base rate, nor 
should point-biserial correlations between dichotomous and continuous variables, 
when these coefficients are used in correlational analysis (the use of normal 
biserial or tetrachoric coefficients is not recommended). Further discussion 
of reliability issues will bo made when required in the later and more detailed 
discussions of analysis within and among MISOE subsystems and in simulation 
aspects of analysis (Parts Four and Five). 

Errors occurring in the processing of information can also have serious 
inferential consequences. Processing induces the choice of models and algorithms 
to be discussed in Part Three. It is recommended that, as an integral part 
of MISOE development, some preliminary analyses using available or even 
fictitious data be run to thoroughly debus all computer software systems 



including such program modifications as may be required, to check out all 
information storage and retrieval aspects of the system including file 
manipulation and documentation, checking out the costs, time, and logistic 
aspects of operating MISOE, and the intercommunicability of all MISOE sub- 
structures. All of this is a part of "tooling up" and may save the staff 
much later embarrassment ov even grief. Similar considerations apply to any 
later expansion of the system involving additional data processing operations. 

The matter of documentation is not confined to data processing, m 
operation, an i^^ fUe should indicate not only the nature and source of 
the inquiry and the f^nal response, but also any interim operations of analysis, 
so that accumulated experience can be referenced when faced with new inquiries 
and records of service experience can be used to train new personnel as the 
system expands or personnel changes occur. 
General Software Considerations 

ThlE, sectton i. addressed to a £ew generd consldarations about the 
data processing and conputer software re<,atren,ents and cSpaHXlttes of MISOE. 

At the data entry level, test answer sheets, qaestloanalres. and other 
protocols .„st be processed 1„ such a way that l„£ormtlo„ can be read Into 
various storage systems. *loh are either Internal or external to the co^uter 
faculties. »,ere such doc»ents re,ulre prell^nary coding and/or Iceypunchlag 
^2^aande« verification is stro;^ reco-ended. staff consideration should 
also be given to the feasibility of using optical scanning and/or optical 
character reading operations. Such procedures are usually at least as reliable 
and so^ti„es .ore so, than verified Ueypunchlng and result In the Information ' 
fro. the document appearing on ..gnetlc tape, as coded 1„ accordance with user 
specifications. Moreover, the tape .ay contain data on variables that can be 
generated from the directly scanned responses, and additional "su-ary" tapes 
^ be generated which contain aggregate data, where feasible, the systems are 



10 

quite flexible and provide much convenience for the user. Feasibility is a 

function of the volume and of the design of the protocol. 

It is recommended that staff contact be inade with representatives of the 

following to ascertain in more detail the feasibility of selective use of such 

. services in MISOE operations at the data entry level: 

Intran Corporation 
4555 77th Street 
Minneapolis, Minnesota 55435 
612-929-4691 

Contact: Mr. Gerald Koch, President or Mr. Dennis Dillon 

National Computer Systems 
4401 W. 76th Street 
Minneapolis, Minnesota 55435 
612-920-6370 

Contact: Dr> Robert J. Panos, Director of Survey Research 
Services 

National Scanning Inc «^,ui.ated 
1110 Morse Road 
Columbus, Ohio 42339 

Contact: Mr. Robert Hopkins, Director of Marketing. 
The first two of these have been successfully used by ACE; the performance 
quality of NSI is less familiar. The first two companies use the same 
scanning principles based on transmitted light and require respondents to use 
a lead pencil (about no. 2%) for marking their responses. The representative 
of the. last company claims that they can reliably read marks made with ball 
point or felt- tipped pens as well. Their system is based on a reflected light 
principle. 

Software for the digital computer facility Includes compilers and program 
packages. In addition to Fortran and Dynamo compilers, Cobol can be useful 
for file record counting and simple manipulations such as match-merge and pull- 
off of subfiles, and for certain kinds of accounting operations. Data-Text 
is a general program capable of. generating more special programs for producing 
frequency distributions, distribution statistics and ^ross-tabulations in 
several dimensions; it is alleged to be imminently available in Fortran from 



11 

Harvard University Computing Laboratory (contact Dr. David Armour). 

The system of analysis programs, known as BIOMED and available in Fortran 
from Dr. W. J. Dixon of UCIA, is very general and provides many options. One 
particular feature is the TRANSGENERATION option that permits variables to be 
transformed in terms of various functions, including crossmultiplication with 
other variables prior to entering a particular analysis algorithm. DYNAMO 
contains similar options. 

Such general compilers and program packages, even xAen available, need to 
be adapted to the particular hardware configuration of the computing facility 
to be used by MISOE. The staff should ensure that these capabilities, including 
special routines for file sorting, handling multireel files, blocking and un- 
blocking tapes, are well adapted, debugged, and documented, and the system 
monitor and all compilers contain thorough diagnostic capability. 

With such software capability, the need for ad hoc programming should fae 
minlioal. Nevertheless, MISOE operations may encounter special requirements 
that imply special programming. Many of the subroutines in DYNAMO can be 
adapted, if necessairy, to table lookup, plotting, and similar needs. BIOMED 
also provides some capabilities in these areas. It may also be anticipated 
that a 'user, upon studying a relationship plotted from empirical data and for 
which the function is not known, will need to fit a function to the plotted 
information, so that the equation or some of its parameters could be used in 
dynamic space. Such curvefitting capability may be rather important for MISOE. 
There will be a need for special programming for developing sampling weights 
and ensuring their additions to data records in such a way that the weights may 
be used selectively in analysis. This matter will receive further attention in 
a later section of this paper and in Occasional Paper No. 12. 



12 

Missing Data Problems 

For a variety of reasons the data record for an observation unit may be 
Incomplete. A student Is absent on the data of testing; he omits some Items 
on a questionnaire. In process and product spaces, local program records may 
not be complete or completely reported, although the staff may be able to 
elicit the missing Information by phone. Incomplete records may be encountered 
in impact space (crime records not complete; followup respondents omit Items, 
etc.). Where whole records are missing, sampling weights may need adjustment. 
This section, however, is "concerned with sporadic losses of information (i.e., 
more or less random and not too frequent) . On some sensitive items like family 
income, losses niay run as high as 10 percent of the respondents. Where less 
than 1 percent losses are encountered, the analytic consequences may be negli- 
ble. 

If the computer facility distinguishes zeros from blanks, the coding of 
data could use zero to code legitimate "zero", "nothing", "no", or "none" 
responses, and use "blank" for missing information. If the computer facility 
doesjiot make such a distinction, or the staff wishes to ensure that files 
could be processed on external systems (e.g., for backup if the main facility 
is "down"), then zero should be used for missing data codes, dlchotomous variables 
coded 2 or 1 (Instead of 1 or 0) , and all other coding of legitimate information 
subjected to a similar transformation to avoid using zero except to Indicate 
missing data. Variances and covarlances are unaffected by such transformation; 
only counts, aggregates (EX, EXY, and EX^) and means are affected and are easily 
adjusted for reporting purposes. Where means are used to initiate level para- 
meters in dynamic simulation, it is necessary to correct the level for the coding 
translation. 

There are three general ' approaches to dealing with missing data in analysis. 
One is to leave the missing data coded as such on the files and to modify analysis 



J,3 

programs to detect and bypass the code when cumulating sums. This is not 
recoinmended for MISOE. The other two methods replace missing information with 
estimated values. In one, an average value is computed across all records on 
the file for \Aich data are available for the given variable. For categorical 
variables, the code for the modal category may be posted; for continuous 
variables means (sensitive to complete distribution assjrmetry) or medians 
(insensitive to assymetry) may be used. This procedure assumes that missing 
information is distributed symmetrically about the replacement value, an 
assumption that is usually false. The procedure also introduces some small 
attenuation in variance and covariances, but is a practical solution for MISOE, 
which can be accomplished during file editing operations. The final method is 
a refinement requiring considerable additional time and computing effort. 
Different replacement values are posted depending on the values of other, pre- 
sumably related variables either by stratification on the other variables or by 
using regression estimates. This reduces the attenuation of variances and co- 
variances and allows for assymetric losses of information from the total dis- 
tribution. However, if missing data is sporadic and not too frequent, such 
refinanents are probably unnecessary. Moreover, if the overall average is 
supplied for each file, when developed and edited, some stratification is 
implicitly introduced by separate files being developed from separate sources, 
for different MISOE subsystems, and levels. 
The Formulation of '^ixes" 

The use of the term "mix" in Occasional Paper No. 5 to denote certain 
patterns or configurations of student characteristics, process treatments, and 
product achievements raises issues with analytical implications. By far the 
most general and consistently useful formulation of a "mix" is the multivariate 
score vector (X^j X2,...,X^), which underlies most models for multivariate 
statistical analysis. In fact a data record (without its ID number and storage 



14 

location information) is such a vector, any subset or linear transformation of 
which can express a "mix" analytically. Geometrically such a vector can be 
represented by a point in multivariate space, the mean vector is the vector of 
means (centroid of the swarm of points), and the space can be divided off into 
regions by various methods to provide codable groups of "similar mixes". (See 
Multivariate Statistics for Personnel Classification . Rulon, Tiedeman, Tatsouka, 
and Langmuir, 1967; the logic is also applicable to entities other than persons). 
All of which makes this the most attractive, general purpose formulation of 
"mixes" for analytic purposes, and applicable across the interfaces of MISOE 
subsystems.- By this reasoning, the various clustering and "pattern analytic" 
methods (e.g., Tyron, Coombs) are not recommended for incorporation into the 
msOE analytic capability, but in any case can be added later on, if need arises, 
as one kind of system expansion. The Guttman scaling approach discussed 
in Occasional Paper No. 5 may prove quite useful; this is discussed further 
in Part Four. 

Any pattern can be coded and treated as an analytic datum regardless 
how that pattern was defined. The pattern variable is dichptomous: X takes 
one value If the pattern applies to the observation unit, another value if it 
does not. Such pattern coding in MISOE will probably be involved mostly at 
the higher analysis levels after factor analysis, discriminant analysis, or 
hierarchical grouping analyses have reduced the dimensions of the patterns; the 
number of possible patterns increases astronomically with the number of dimensions 
and the number of categories on each dimension. 
Followup Problems 

.On completion of occupational education (or non-OH) a student moves back 
into the larger societal space and his behavior reflects the impact of his 
educational experience on him .(product) and in turn, Kis Imgact on society as 
a taxpayer, producer, consumer, voter, or felon. The anticipation of assessing 



15 

certain Impact data through mail and/or phone follow contacts of "alumni requires 
that MISOE lay the groundwork for such contact while the student is still in 
the pipeline. The student's name and home address (or "address where he can 
always be reached") should be obtained on entry (so that dropouts can also be 
followed up) and updated on exit from educational programs. Provision should 
also be made during each followup inquiry for updating the address for the next 
followup of the same su-ject^ 

The name and address file should contain the subject's ID number ^ date of 
birth, and sex to help differentiate the James Jones's, those with such first 
names as Vivian or Shirley, and those with first names that might be confused 
through keypunch errors: Carl vs. Carol vs. Car oil, Marion vs. Marian, etc. 
The name and address file should, of course, be maintained separately from, and 
at a higher level o f security than the coded data files. The use of separate 
ID nuinbers for data and name and address files with a link file provides addi- 
tional confidentiality control at somewhat higher cost and staff operating in- 
convenience. An extensive literature has developed regarding confidentiality 
and related ethical and legal problems with data banks. 

It would be prudent for the staff to maintain a backup copy of the name 
and address file and of each edited data file, whether a basic or followup 
file, and to keep the backup files in a separate location. 

The creation of merged files containing basic nnd followup respondent 
data may be anticipated. This will involve special weighting procedures to 
adjust the data for bias due to nonresponse to the followups. These issues 
will be further delineated in Occasional Paper No. 12. The various analysis 
issues, including those involving interfacing MISOE subsystems, will apply to 
the followup data and operations on the merged files. Models and computing 
procedures will be similar to those for analysis of basic files except for 
changes of variables and their associations with MISOE subsystems. 



16 



I mplications of "Initial Data Points" and Cohort Replacement 

Rather than initiating MISOE solely with an input cohort in each process 
channel and following it through time (the purely longitudinal approach), which 
strategy requires waiting for product and impact data to become available, the 
staff has proposed combining longitudinal and cross-sectional designs. Thus, 
data will be collected initially not only in input space for an input cohort, 
but also in qiiryen^: process space and product space by whatever stage earlier 
cohorts may be in a given program, and in impact space for recent "alumni". 
The latter includes an initial prototype followup survey using addresses 
available in school records. It may be anticipated that the contact and response 
rate will be somewhat lower than that obtained in followups based on systematical- 
ly established and maintained name and address files. Nevertheless, the infor- 
mation will be useful for obtaining initial estimates of levels and variations 
in the impact space and for obtaining information useful in planning later 
followups of initial input cohorts ♦ This will also be generally true of the 
initial cross-sectional data, which will not be raatchable an<i only be partially 
inter faceable across MISOE subsystems. Estimates of distribution parameters 
within subsystems can be established, as well as some time trend information. 
With this arrangement, it will also be possible to estimate correlations among 
variables between adjacent MISOE elements earlier than vrould be the case in a pure 
longitudinal design. Nevertheless, a sound dynamic simulation capability may be 

r 

limited to relatively simple subsystems and associated flow models. Level and 
rate information will be available but only gradually will auxiliary modifiers 
of rates be systematically available from inter-element regression description. 

The staff also proposes to replace an input cohort on each program only 
after a current cohort has completed the program rather than study every input 
group. This decision is logistically sensible, but it. should be recognized 
that it may reduce the comparability of information across programs of different 
length. 



17 



Some Functional Issues In Data Processing and the Computer Facility 

Two letters sent to the staff following the conference held In February, 
1972 contain comments on Occasional Papers No. 1, 2, and 4, with a promise to 
comment on Occasional Paper No. 3. This section fulfills that promise in the 
present context of discussing general analytic considerations. Despite sub- 
sequent staff work on MISOE development, reflected in Occasional papers No. 5 
and 6, most of the comments in the two letters remain valid and reasonably con- 
sistent with material In this paper. Where exceptions occur, present commentary 
should override that In the letters. Nevertheless, those letters should be 
regarded as attachments to this paper as referenceable documentation in MISOE 
development. 

Occasional Paper No. 3, labeled "very tentative", is addressed to the 
functional problems of handling information in connection with the computer 
facility. The organization of the data-entry subsystem is basically sound and 
Includes provision for coding the stratification cell associated with a basic 
data record. Provision should be made in the data record layout for appending • 
weighting factors, the number and nature of which will be more specifically de- 
fined in Occasional Paper No. 12. 

The stratification cell code per se permits linkage only with the original 
sampling stratification cell weights. It is anticipated that one or mora 
differential weights will be required for Incomplete random sampling within 
schools and for correcting respondent data from f ollowups for nonresponse 
bias. Not all of the weights will be needed in all analyses. Different kinds 
of analyses will require particular subsets of weighting factors. The weight 
actually applied to a particular record will typically be the product of in- 
dividual weights in the subset. E.g., a student record*may contain three 
basic weights (possibly more), Wj^, W^, W^.The weight applied in a particular analysis 
might be W^W^, W^W3, or V^^^M^. As a minimum W^, and W3 should be on the 



ERIC Ditto for a process records for a program within a school. 



ERLC 



18 

record, it will be corrvenlent for the product weights to be .posted to In- 
dividual records. Otherwise the product weight would have to be formed each 
time it was used in analysis and provision for this incorporated' either in 
information retrieval operations or in computer program modifications. 

"^^^ ^ cohort replacement has occurred at the end of process, the input 
data for the new cohort is added to the system. However, the impact data for 
the original cohort may not yet be available so that retiring their records 
from the entry-level subsystem may be premature. Controlling product-impact 
analysis for input will require the capability of developing input-impact 
correlations as well as the previously computed input-process correlations, pre- 
sumably retained in the analysis subsystem. Moreover, the development of 
weights to correct followup data for nonresponse bias will involve special 
analysis and comparison of the input characteristics of respondents and non- 
respondents. Given adequate disc storage, entry-level data may be retained 
until no further use is expected; this solution hardly seems feasible in view 
of the expectation of conducting multiple followups over- several years. The 
information can be removed from disc storage and stored on tape for reading 
into the computer processing unit, being sure that the complete ID, storage- 
retrieval, and linkage codes are retained in the tape record layout. 

Provision must also be made not only to printout cumulations and averages, 
but also to store them for analysis, as indicated in Figures 1, 3, 4, and 5 of 
Occasional Paper No. 3. However, the sample data computations of basic statis- 
tics must consider the weighting factor (product of any relevant ^?eights) as 
follows: 

N = ew 

iX = £WX 
tX2 = iwx2 

?XY = mY 

V ~iwx 

n^x2 - (ix)2 = iWx2 - (JWX)2 

n<XY - ^XiY = iWXY - JWX^WY, 
vihere W = I for census data and 



19 

the expressions on the left are population estimates from sample data. In 
general, W will not be a constant across surnnation and therefore cannot be 
applied externally to a summation, e.g., VEX* Anywhere the population para- 
meters are known from census data, they should be used in preference to the 
sample estimators. 

Most available analysis programs do not contain the weighting option. 
Some regression programs contain options for inputlng either raw data or a 
previously computed correlation matrix. The necessity for adapting and modifying 
analysis programs, discussed in earlier sections, may be somewhat simplified 
with basic weighted accumulations held in hardware storage; nevertheless, having 
such capability -in the programs as branching options will maximize flexibility. 

Considerable attention is still required to the genera? ?.ssue of the degree 
to which level two analysis products should be kept ir, computer storage. Whole 
correlations matrices, some of large order may be developed for repeated biit 
infrequent use. Moreover, the size of the matrices will grow as longitudinal 
data become available. One solution is to store these externally on tape and 
store regression results internally for easy call in dynamic simulation.* 

The next part of this occasional paper discusses the MISOE repertoire of 
models and associated computer alogrithms. Therein will also be discussed the 
problem of variable selection and variable elimination > and the issue of relative 
importance of contributions that figures S'-S in Occasional Paper No. 3 raise. 
The tentative formulation, not wholly satisfactory, served to lay these matters 
on the table and to indicate their geaeral place in the MISOE development pro- 
cess. Comment on the optimization and simulation section of Occasional Paper 
No. 3 will be integrated with that on Occasional Paper No. 6, and deferred to 
Part Five of this paper . 

*This requires careful tagging of this particular regression analysis from 
which such parameters were developed. 



20 

Part III. The Analysis Repertoire of MISOE: Models and Algorithms 

This part of the paper continues the discussion of general analysis con- 
siderations across inquiries and MISOE subsystems with explicit attention to 
the repertoire of cnalytic models and their associated algorithms (what Occasional 
Paper No. 3 figures call "analysis options"). There is no need to elaborate 
further the discussion of the counting, aggregating, and distributing options, 
or to enter into extensive discussion of univariate distribution statistics. The 
system needs the capability of outputing weighted cross-tabulations, a capability 
provided by Data-Text or modifications thereof). Except for capability of 
computing phi coefficients and chi~square statistics (from weighted frequencies), 
the need for nonparametric statistics is judged to be low and therefore of low 
priority for MISOE. The ensuing discussion will therefore be focused on more 
complex analysis options. For each model discussed, the approach will be in 
terms of what the model does to accomplish what purposes, its relevance to, and 
therefore priority in, MISOE development and operations, and the inferential 
hazards in its application. It will be assumed that suitably weighted correla- 
tion matrices have been developed and stored under readily retrievable conditions 
for those models requiring them. Most general regression programs (e.g., bIOMED 
02R) contain subroutines for selecting a subset of variables for a particular 
analysis (Cf. Boruch and Dutton-s program VARELIM in Educational and Psychological 
Measurement , 1970, 30, 719-21.) 
Multiple Linear Regression 

With the possible exception of DYNAMO, the most general and most powerful 
analytic tool in the MISOE repertoire will be a highly general regression 
capability. The generality and power of the multiple linear regression model 
result from an appreciation of just what is "linear" about the model, and from 
the availability of computing algorithms which permit formulating a problem in 
a specified model .or developing the most efficient specified model for prediction 



21 



from an a priori set of independent variables. Both sources of generality are 
Important for MISOE. The general form of the model is: Y «^b^X^4C, 

The "lin-aarity" of the model refers to the algebraic form of the equation 
in respect to the model parameters to be estimated (the regression weights, b ). 
and not to the variables, X^. In this context "linearity** has no reference to 
the shape of the scatterplots among the or between and Therefore^. X^ 
may be the value of any function of a^ observed variable cr even a function of 
the other variables in the model e.g., the parabolic polynomial regression is 
linear in the above sense: 

Y • h^x^ + 1^2 S^z ^^h ^sh ^ ^ 

Moreover, there .is no restriction that be continuous, quantitative variates; 
membership of an observation unit in a qualitative" category (e,g., subject is 
malv^) may be indicated by dichotomous coding (1 if yes, male 0 if no, female; 
but recall the 2/1 transformation discussed earlier if zero is to be reserved 
for missing data). This notion also applies to canonical and discriminant 
regression (Tatsuoka) . The power of the tool is further, enhanced by the fact 
that a wide variety of hypotheses may be tested by comparing the R for a full 
model with that for a reduced model consisting of an appropriate subset of the 
variables in the full model, A simple function of this difference is distri- 
buted as the F ratio familar in ANOVA. For elaboration and numerous examples 
of the power of this analytic tool, the staff should be familiar with Applied 
Multiple Linear Regression (Bottenberg and Werd, March 1963, PRL-TDR-63-6, 6570th 
Personnel Research Laboratory, Lackland AFB, Texas), and with Research Design 
m the Behavioral Sciences : Multiple Regression Approach (Kelly, Beggs, and 
McNeil, 1969, Southern Illinois University Press), One consequence of this 
for MISOE is that no separate programs are required for ANOVA and ANCOVA, which 
can be readily formulated in linear regression terms. Moreover, unlike classical 
ANOVA, the regression approach readily handles the "nonorthogonal case". 



22 



The other source of generality and power is the stepwise computing aigorithm, 
which may be used not only when a set of predictors is specified, but may also 
be used to select the roost predicting subset of variables for which data are 
available. Moreover, the variables are selected in order of their ability to 
. add prediction of Y to that of the previously selected variables up to some 
stop criterion. This capability is especially useful for MISOE where the depen- 
dent variable may be a level variable in dynamic simulation; the selected predic- 
tion variables are then relevant candidates for other level variables in formula- 
ting the simulation model. Moreover, the associated regression weights, or 
functions thereof, are possible parameters in auxiliary equations modifying rates 
in dynamic simulation. The actual, not the predicted level of Y should be used 
in the simulation model. 

One objection to the stepwise algorithm is that it tends to capitalize 
on sampling errors in the correlations. For this reason, and in the context 
of most anticipated 14IS0E applications, the stop criterion should be set in 
such a way as to reduce the number of variables entered whife still giving, a good 
approximation to maximum prediction. The BIOMED regression program controls on 
the F to enter or remove variables and can be chosen relative to sample size by 
correspondence with the associated probability level, with p of ..05 entering more 
variables than p of .01. The computer printout shows the at each step and 
the variables selected at that stage. When, after several variables have entered, 
one is removed, a point has probably been reached where one is dealing with un- 
stable artifacts based either on sampling error or the multicollinearity pattern 
of the system. One should stop iteration before that point. The program used 
by the Air Force Personnel Laboratory was based on the old Kelley-Salisbury 
technique of iterating on the regression weights, rather than on the n-th ordered 
partials. in that program, the stop criterion was an Increase in the specified 
in the control card and for MISOE purposes would be set about .0004 (change in 
R of .02). 



For exploratory purposes, more liberal stop criteria nay be used than those 
suggested above, allowing more variables to enter. Also, it is not necessary 
to take the parameters of the final equation for use in subsequent analysis* 

Greater flexibility in subsequent use of regression weights may be obtained 
by having both raw and standardized weights computed. Most programs compute and 
output one, but not the other. The subroutine to convert is simple to wite 
and incorporate as a computer program modification. Which form to use \Aien re- 
gression weights enter simulation equations will depend on the metric of the 
associated level variables. Some metric issues in dynamic simulation will be 
discussed briefly in Part Five. 

The staff miy find that some variables enter none of the regressions and 
if no other uses for such variables are found in MISOE operations, it may not 
be necessary to obtain data on them in the replacement cohorts. Conversely, if 
a variable enters rather consistently but weakly (late to enter; low regression 
weight) in many regressions, consideration should be given to obtaining purer 
measures of whatever factors are measured by the original variable. 

The use of ipsative measures, at least in regression, should probably be 
avoided in MISOE. Examples are the Gordon and Edwards Personality scales and 
forced-choice interest instruments, ^ich can be useful in guidance and coun- 
seling, but whose behavior in analysis is sometimes difficult to interpret. In 
some of these instruments, items scores are ipsative but derived scale scores 
nearly independent, in ^ich case the misgivings expressed here are less relevant. 
Relative Contributions in Regression 

The results of a regression analysis express the pattern of functional 
relationships explaining observed differences in the values of the predictand, 
Y, in terms of observed or induced changes in the values of the predictors, X^. 
It is beyond the scope of this paper to discuss the semantic morass and attendant 
philosophic issues implied by referring to such interpretation of regression 
results as "causal inference". Various analytic operations have been promulgated. 



24 

however, for ascertaining the "relative importance**, ^'relative contri but ion", 
"Independent contribution**, or **unique contribution*' of the to the pre- 
diction of Y. All of the methods depend on the pattern of correlations among 
the variables and none has any necessary reference to the temporal ordering 
of events presumed in "causation**; nevertheless, the practical importance of 
such operations is to give a partial answer to the question, ''If I manipulate 
conditions such that the value of one X changes in the context of other X 
rel£ted to Y (which other X^ values may also change) , what change in Y will 
result?" Answering this question is useful to a decision maker. Dynamic 
simulation adds the temporal dimension to this question and therefore to the 
way it is answered. 

Operations for answering the question in regression terms are of three 
kinds: those primarily and directly depending on the rate of change in Y 
irtth respect to a change in X^, as expressed in the regression weights; those 
accounting for variance in Y by partitioning partial regression variance and 
efither implicitly or explicitly involving residual scores; and, a procedure 
for partitioning variance in Y in terms of the orthogonal factor variance of 
the system. 

Interpreted as slopes, rather than as contributions to predicted variance In 
Y, regression weights are legitimate indicators of relative importance, and also 
of independent contributions of Xj in the special sense that intercorrelations 
among the Xj have been taken into account in the estimation of the weights* In 
another sense, they are not independent of each other since all b| estimations 
depend on the Xj and the XjXj correlation pattern. The b'[ lend themselves to for- 
mulating and solving dynamic simulation models, by their relative size Indicating 
the need for Incorporating the corresponding Xj probably as a level variable In an 
information loop, and Its current mean as an initiating value* The bj values may 
affect rates connecting levels in X\ and Y, either in rate equations or In auxiliary 
equations modifying rates; more likely the latter since bf are rates of change of Y 



25 

with respect to Xj rather than with respect to time. A word of caution: the bj. 
values are relative to the other variables included in the particular regression 
analysis and to the particiSlar population or subpopu lation on which they were esti- 
mated; presumably, then, the same variables should be involved as level variables 
and the same subpopulations should be involved in the dynamic simulation model in 
which they are used as moderators. Raw regression weights are also metric- 
sensitive. 

The accounting for variance in Y in terms of Xj variances can also be 
-used to facilitate choices of variables to include in a simulation model and 
the variance contributions used in rate modifying equations. Moreover, the 
relative variance contributions, being ratios, are invariant under choice of 
raw vs. standard metrics, which is not true of regression weights. Partitioning 
of predicted variance also has the advantage that the contributions of process 
to product, or of process to impact, can take account of the influence of input 
variance, and should do so in dynamic simulation whether the input levels are 
explicitly part of the model or not. "Aus, the variance contribution ratio of 
a process mix, used as a rate modifier in simulation, should not be contaminated 
with input variance. 

Two frequently used procedures for variance partitioning may be summarily 

rejected unless the are mutually orthogonal. One, based on the formula for 

the variance of a linear composite, defines the variance contribution of X. as: 

2 J 

y 

The other, based on a formula from regression theory, defines the contribution as: 

Vljr 
2 

y 

Under conditions of orthogonality, the covariance terms vanish in the first 



26 

2 2 

procedure, and, in the second, the r^^ reduce- to b^ or r^^. When, as usual, 

2 

the are intercorr elated, neither the(f^, the T^^^iS^^^y ^iy 

are independent in any intelligible sense. Their only virtue is that they add 

up to the total composite variance. 

Most other methods for partitioning regression composite variance depend 

Implicity or explicitly on residual scores of the form, « Y-£b^X^. This 

Includes procedures based on the Bottenberg-Ward comparison of two regression 

models, one being a subset of the other (e.g., the Creager-Valentine or Mood- 

Mayeske -uniqueness-commonality model used in reanalyzing Coleman Report data). 

In these procedures, "independent contribution" means the amount of unique 

valid variance a subset of variables adds to the other variables (not in the 

subset) to yield the total variance in the full model. Although the variance 

partition of any one subset is orthogonal to the remaining subset, taken as a 

whole, the variance partitions among subsets (independent) are not orthogonal 

to each other; they add up to the total variance only because the commonality 

at the top of the hierarchy of "joint contributions" is estimated by subtraction 

from the total variance. 

When the total regression composite has been built up by stepwise selection, 

the most valid and least correlated variables are likely to be picked, so that 
commonality partitions and such associated procedures as covariance control for 
input (discussed in the next section) may be reasonable and practical for MISOE. 
Even with some collinearity in the system, the inferential differences between 
partitioning variance in this way and using an orthogonalizing refinement should 
be negligible (see e.g., Creager, "Academic Achievement and Institutional Environ- 
ments: Two Research Strategies", Journal of Experimental Education <W, No. 2, 197! 

The orthogonal partitioning of regression composite variance into the 
coninon and unique components defined by complete orthogonal factor analysis is 
applicable to any linear composite, including canonicals and discriminant 
functions (see Creager and Boruch, "Othogonal Analysis of Linear Composite 



ERIC 



27 

Variance", Proceedin gs, 77th Annual Convention . American Psychological Association, 
1969; Creager, "A Fortran Program for the Analysis of Linear Composite Variance", 
Educationa l and Psychological Measurement . 31, No. 1, Spring, 1971; and Creager, 
"Orthogonal and Nonorthogonal Methods for Partitioning Regression Variance", 
American Educational Research Journal . 8, No. 4, November, 1971.) The advantages 
are that the partitioned components are all mutually orthogonal and additive to 
tocal variance, and can therefore be pooled across factors defined by subsets 
of variables of analytic concern (e.g.. Input, process, product or Impact). To 
be maximally useful In practical application, the factors must be interpretable, 
either as a hierarchical structure like that of Schmid and Leiroan (Psychometrika . 
22, No. 1, March, 1957) or as an approximation to simple structure in which 
factor variance is spread across factors (normalized varimax rotation is popular) * 
The procedure requires factor analysis of the regression system, and better 
factor definition can be obtained if additional marker variables are included 
in the factor analysis (their weights in the regression composite are zero and 
have no effect on the account of variance in that composite). Moreover, the 
factors may be less conceptually meaningful and communicable to MISOE users than 
the directly observed variables; in some applications, however, the factors can 
be regarded as more meaningful and the variables can be regarded as proxy measures 
of those factors. The delineation of space differ entiations with their associated 
instrumentation classes suggests that such a view may already be implicit in 
staff thinking. 

One implication of this line of thought is that some of the level variables 
in dynamic space may be factors rather than observed variables, but this would 
require the additional computation and storage of factor scores and development 
of trend Information for use in defining rate equations for simulations. Whether 
or not this capability will be considered in further MISOE development, the vari- 
ance accounting use of the orthogonal partitioning for Input control (Its 
original purpose) may be useful where the Xf are moderately i ntercorrelated. 



28 

Control of Process-Product and Process^- Impact Analyses for Differential Input 
Serious Inferential errors may result from conducting process-prodnct and 
process-Impact analyses without control for the nonrandom variation in input 
variables among students subjected to various process treatments. A given pro- 
gram or process variable may be unduly credited (or blamed) for changing 
students vho differed before treatment or who would have changed the same amount an 
in the -same direction given an alternative treatment (or no explicit treatment 
at all). Therefore, it is generally wise to pretest students at input time on 
product and impact variables and to use procedures which control analysis of 
process effects for input. An exception has been indicated which 
simplifies matters by making a plausible assumption: that input levels for 
objective skills in product space are constant at zero for secondary students 
in occupational education programs. While probably not strictly true, and ignoring 
possible differences on related student characteristics (e.g., psychomotor 
abilities), the assumption is probably a practical one. It is recoiranended, 
however, that such not be extended either to other variables in the product 
and impact spaces, or to other educational sectors (non-OE). 

All but one method of controlling the process effects for input depend on 
some manipulation of residual scores, which may or may not be explicitly computed 
and manipulated. Astin ("The Methodology of Research on College Impact", 
Sociology of Education. 1970. 43) has reviewed several strategies using regres- 
sion methods and which involve multiple-part and/or multiple^partial correlations 
and, in effect, are variations of multivariate analysis of covariance. These 
procedures are useful and practical. Critics point to the unreliability of 
residual scores, but usually support alternatives in which residuals are impli- 
citly involved. Various schemes for stratifying, matching, and moderating have 
similar problems. 

The main objection to such a practical approach as first regressing product 
on input and, then, regressing residualized product on residualized process, is 



29 

that not all factors conroon between input and process should be treated as input 
(or as process, if the order of regressing the sets is reversed) sources of 
variance. They may be exogeneous or situational factors (e.g., cultural climate, 
connunity affluence) affecting both input and process, but not sensibly identi- 
fied with either set of variables. E.g., the relation between family income 
and space per student might be a function of local community affluence, and if 
some student outcome (product or impact) were related to both, it would be quite 
doubtful whether management should be advised to raise salaries and wages in 
the community or raise taxes in the community to enlarge floor space at the 
local school; it would be quite dubious to partial family income out of space 
per student in the analysis. 

With such a situation, by no means unconroon, the orthogonal analysis 
of the variance of a full model composite derived from free entry to both input 
and process variables may be helpful. Using a hierarchical factor analysis, the 
composite variance majr come out on input factors, process factors, and on factors 
that are defined by a combination of input and process variables. With the 
latter explicit and interpretable, valuable clues about "reality" may result as 
well as some suggestions for formulating simulation models. 
I^rtncipal Components and Factor Analysis 

Most computer program packages contain routines for extracting eigenvalues 
and eigenvectors, and for performing factor analysis, including rotating trans- 
formations (often restricted to the normalized varimax rotation). Such capa- 
bility will be required for MISOE if orthogonal analysis of composite variance, 
canonical, or discriminant analysis are anticipated. Moreover, they will be 
useful directly for descriptive analysis of MISOE data content within and across 
descriptive space subsystems, whether a factorial approach to simulation is 
contonplated or not. It will be useful to know the extent of within-space 
redundancy of information. The priority for such capability is moderate, being 



30 



less than that for distribution and regression analysis, but otherwise valuable 
for MISOE to have in its analytic repertoire. Certain variations of factor 
analysis, such as alpha or image analysis, or the Guttman simplex,^ circumplex 
and radex models are of doubtful utility for MISOE and such specialized 
capabilities may be deferred until the need for them becomes apparent. 

The number of principal components or factors to rotate is popularly taken 
to be those with associated eigenvalues greater than unity. This practice has 
been challenged by Humphreys ( Educational and Psychological Measurement ^ 24, 1964) 
when sample size is large, and by Shaycof t ("The Eigenvalue Myth and the Dimen- 
sion-reduction Fallacy", mimeo available from the author), on both theoretical 
and empirical grounds. It is better to examine the- plot of eigenvalue number 
against its size, looking for a break in the curve beloy a unit eigenvalue, but 
in any case to allow more degrees of freedom for rotation than permitted by the 
common rule. It will rarely be worthwhile to rotate vectors with eigenvalues 
less than .75, but one needs to retain after rotation only those factors which 
have loadings on more than one variable for defining consnon factor space. For 
the orthogonal analysis of composite variance, it is better to rotate too many 
than too few vectors because the purpose is to account for total variance rather 
than to minimize rank. If a good fit to hypothesized structure can be obtained 
by maximum likelihood methods, this capability may be useful (Joreskog and 
Gruvaeus, Educational Testing Service Research Bulletin, RB 67-21). Generally, 
oblique solutions \d.ll not be useful for MISOE except that the factors may more 
nearly match the meaning people associate with the factor name. 
Canonical and Discriminant Analysis 

An initial view of the multiple-space structure of MISOE suggests that a 
general capability to perform canonical regression relating one space mix to 
another would be indicated. With the exception of the special case of multigroup 
discriminant analysis, this is somewhat doubtful because: 



ERLC 



*Not to be confused with the simplex algorithm of linear programming. 



31 

1. ••mixes*' will probably be better defined within spaces in accordance 
with simple regression against individual criteria to maintain 
flexibility and to relate spaces by simple regression of coded mixes, 
than to define mixes by weights that maximally correlate different 
space mixes'. This Is, however, debatable and further discussion of 
the point is invited (see also. Part Four below for further discussion); 
2.. canonical vectors may be difficult to interpret both within the staff 

and to external users of MISOE; and 
3. the use of canonical information for dynamic simulation appears moot.' 
General canonical capability and associated models like Tucker's interbattery 
factor analysis and Hotelling's most predictable output mix are given low priority 
at this stage of MISOE development. 

Multiple discriminant analysis has two parts: First, given an a priori 
set of groups to be discriminated (e.g., those ''alumni" with certain product 
or impact mixes), define the discriminant space as a weighting of student 
characteristics (or process variables) which maximally discriminate the output 
groups (e.g., successttil and satisfied, lawabiding and taxpaying citizens vs. 
welfare recipients, felons, and frustrated, angry protestors). The second 
part deals with the classification and allocation of personnel,* such as a new 
input cohort, to groups on the basis of their characteristics. Management not 
only has the option of changing process, but also of changing student inputs, 
by using guidance and counseling procedures which advise the student of likely 
outcomes of decisions (which should remain his) to enter certain programs or 
pursue certain occupational careers. Since the book by Rulon et. al., cfted above 
thoroughly discusses this analytic area^ further discussion here Is unnecessary 
beyond a judgment that discriminant capability (and associated personnel classi- 
fication and allocation) has a moderately high claim for priority In MISOE devel- 
opment and a strong chance that it may prove useful for certain classes of man- 
Y-rh^r^' agement decisions. 



This can be applied In an original dafa space as wel.l as. 1 ajjlspxJ Mna nt .sp.ac.e.. 



32 



Temporal Analysis 

The longitudinal aspects of MISOE and the dynaniic simulation plans demand 
analytic capability in descriptive space for temporal analysis. None of the 
foregoing models and analytic capabilities pay any explicit attention to the 
passage of^; implicitly the information is temporally ordered by association 
with the temporal order of MISOE elements and by reference to cohort replacement 
sampling and foUowup measurement in impact space. This reflects staff recognition 
that it takes time for students to flow through the educational process and to 
••make their mark in the world", and that it takes time for management decisions 
to be implemented and for their effects to be felt. 

The plotting of aggregate census data and of appropriately weighted sample 
data against time should provide some of the rate functions and some of the 
modifying auxiliary functions required for dynamic simulation, which is MiSOE's 
major approach to coping Hth the temporal aspects of the state system of education. 
The probl«n for MISOE development is to ensure that variables whose values or 
distributions change over time with some naturally (i.e. , without management decisions) 
and reasonably smooth frequency are repeatedly measured and that the information 
storage and retrieval system (especially coding) Reflects the time of measurement; 
or more Importantly, the time at which measured events occur, m the case of 
changes induced by management decision (whether under MISOE recommendations or not) 
change m level or distribution of a variable may be li»nediate, unique, and dis- 
continuous, or may have scattered and delayed effects across the system, it woulld 
seem essential that MiSOE have codable, storable, and retrievable knowledge of the 
nature and tl^ne of such decisions and of their Implementation, if the system is 
to be able to reflect "reality" and if some dynamic simulation models will include 
Information feedback loops, it is recommended that some attention to these 
issues and to their analytic consequences be given rathe.r early In further MISOE 
development. 



33 

For the more natural and .smoothly occuring changes, plotting routines are 
available in many program packages (BIOMED, DYNAMO), and may in fact be useful 
in describing more precipitous changes. Also, for the latter, Campbell's dis- 
continuity regression concept may prove to be a useful tool, but it is not clear 
at present how this might be integrated with other analysis procedures for MISOE. 

The whole methodology of lag correlations used in econometrics and in 
certain mathematical formulations of learning theory may be useful analytic tools 
for MISOE in dealing with tmporal analysis. No attempt will be made to delineate 
these possibilities in this paper, but their potential utility for MISOE should 
have an early appraisal. 

Miscellaneous Analysis Tools of Moderate to Lower Priority 

This section deals briefly with some analytic tools of potential value to 
MISOE, but for which there does not seem to be any immediate demand for inclusion 
in system capabilities. 

The first of these is path analysis, originally designed to- delineate and 
investigate causal hypotheses in genetics, and in recent years, adapted and elabo- 
rated by sociologists. Typically, a hypothesized pattern of causal relationships 
is represented in a path diagram. The solution of a set of linear 
equations provides "path coefficients", which are usually regression weights or 
simple functions thereof, and which are interpreted to represent the strength of 
a particular path in the diagram, it is difficult to see what path analysis 
could do for MISOE, which is not better handled by the dynamic simulation model, 
where rates and their modifiers provide a more complete map of "reality" and time 
is explicitly taken into account. The only reference to time in path analysis is 
the use of arrows in the path diagram to connect level variables and to represent 
the temporal order of events. Staff familiarity with the logic of path analysis 
may be helpful in formulating dynamic simulation models. A recent paper gives a 
good introduction and list of references pertaining to path analysis (Anderson, 



34 

James G. and Evans, Francis B,, ''Causal Models in Mucational Research: Recursive 
Models", Working Paper No. 50, Institute for the Study Social Change, Department 
of Sociology and Anthropology, Purdue University, 1972.) 

Hierarchical grouping is an empirical taxonomic procedure of considerable 
potential value to MISOE. The need for it is not envisioned as imminent, hence 
the relatively low priority given here; nevertheless, the staff should consider 
adding such a capability to its analytic repertoire at a not too distant future 
date. It could be useful for defining student types, classes of process and 
product mixes, ot "alumini" types (for subsequent discriminant analysis, as indicated 
above). The procedure is one way of reducing the large masses of data information 
in MISOE, where the loss of information on individual" objects can be tolerated. 
Some key references to the logic and applicability of this model are: 

Ward, Joe H., Jr. ''Hierarchical Grouping to Optimize an Objective Function." 
Journal of American^Statistical Association . 58, March, 1963. 

Ward, Joe H., Jr. and Hook, Marion E. "Applicatiori of an Hierarchical 
Grouping Procedure to a Problem of Grouping Profiles." Educational and psychological 
Measurement > 23, No. 1, Spring, 1963. 

Bottenberg, R. A., and Christal, R. E. "An Iterative Technique for 
Clustering Criteria which Retains Optimum Predictive Efficiency." WADD-TN-61-30, 
Personnel Laboratory, Lackland AFB, Texas, March, 1961^ (Clustering of regression 
equations in terms of homogeneity of regression). 

Rock et al., ( American Educational Research Journal , Vol. 9, No. 1, Winter, 
1972) have proposed and illustrated a strategy for studying process effects by 
grouping programs on the product-input regressions and then using process variables 
to alscriminate the groups equated for input. The strategy combines regression, 
hierarchical grouping, and discriminant analysis, and is too new to permit a fair 
appraisal. One difficulty that may be encountered is that the regression composites 
within some of the smaller programs may be too unstable to carry the grouping and 



35 

discrimination load which follows. 

The "policy capturing" model is a special case of static simulation using 
regression analysis to simulate subjective (or aesthetic) human judgements. 
This is done in terms of the objective information available to the judge(s) about 
the set of objects being rated or ranked. It has some interesting possibilities 
for MISOE and for the management of MISOE in the latter' s interaction with rep- 
resentatives of societal space. It is quite conceivable that MISOE might want 
to simulate (dynamically) for state level manage^nent. the effects of local policy 
judgements on the state system, where such judgements are made on the basis of 
subjective weighting of information available either locally or through state and 
" regional communication channels. This may be useful in incorporating and using 
information feedback loops in dyneiralc simulation models. One outcome or policy 
option for the state level manager may be selective emphasis in information dis- 
semination; alternatively, the MISOE management may want to know the relative 
weights that educational management gives to information in the system, whether 
or not that information is MISOE input or output. 

To apply the model requires collection of ratings or other scaled judgements 
on a set of objects (programs, allocations, relative importance of societal goals, 
etc.), measures of the information available to the judges (a single manager, 
managers, a committee or panel) about the objects rated, and the regression package. 
The multiple R is usually very high and .measures the validity of the policy 
capturing simulation, and the weights give the substantive information as averaged 
across the judgements. The technique is also useful in aiding panel consensus by 
feedback of its results to the judges. Two papers by Dr. Raymond E. Christal, 
Personnel Laboratory, Lackland AFB, Texas are relevant: 

"Selecting a Harem — And Other Applications of the Policy-Capturing Model", 

PRL-TR-67, and 

"JAN: A Technique for Analyzing Group Judgement", PRL-TDR-63-3. 
The latter involves integration with hierarch.lcal grouping. 



36 

The National Academy of Sciences - National Research Council uses the 
procedure in the evaluation and selection of candidates for National Science 
Foundation graduate fellowships. The multiple correlation between objective and 
codable information provided to the judging panels and the judged ratings of 
applicants has consistently been about .85 over several years. This is only 
slightly less than the estimated reliability of the nanel judgements. 

Part IV. Noneconomic Analysis Considerations Within and 
Among Subsystems of Static Space 

Introduction 

Although Parts II and III discussed analytic issues of concern regardless 
of the MISOE subsystems involved, considerable reference was made to these sub- 
systems. Nevertheless, more specific issues are raised in Occasional Paper No. 5 
for descriptive analysis in static space, and in Occasional Paper No. 6 for 
simulative analysis in dynamic space. This part focuses on the more specific 
analysis issues raised in static space; the next part focuses on those 
raised in dynamic space. Primary concern in the subsequent sections of 
this part will be with educational space and with the educational post-impact 
space (see Figure 1 of Occasional Paper No. 5) . However, the need to define 
optimal process and product mixes bjjr student type , combined with the fact that 
available data rspresent the status quo , poses a special problem for analysis not 
considered in Pert Two above. The next section, therefore, discusses this problem, 
prior to giving specific attention to the analysis problems within and among the 
process, product, and Impact spaces* 
The Range Restriction Problem 

Much of the initial data for MISOE come from the present structure and 
"student fW characteristics of the operating educational system. As noted in 
Part III, above, prior educational management decisions about what kinds of 
students enter what kinds of programs (with their associated processes, products. 



37 



and impacts) results in differential inputs to the various "pipelines". Thus 
all data about students within any process-product-impact channel reflect the 
status quo. Managers will want to know the result if some student mixes not 
presently in a channel were permitted or encouraged to enter this channel, or, 
put another way, the result if a given student mix were to go through a different 
educational channel. The earlier discussion of controlling analysis for dif- 
ferential input was concerned with reducing the risk of inferential error when 
coiq)aring results of analyses across channels (e.g., OE vs. general vs. academic; 
TV repair vs. automechanics; automechanics is school A vs. that in school B), or 
when judging the efficacy of a process within a channel. In the search for optimal 
matches between student and program mixes, or the search for optimal mixes within 
. spaces given a fixed mix in another space, data for currently nonexisting matches 
of students and programs will not be available for comparison. This implies that 
optima may be missed. Moreover, inferences from analyses carried out within an 
IPPI channel will be strictly relevant to the status quo for that channel. 

In dynamic simulation, initial values of level variables can, and often 
should, represent the status quo, and then, additional simulation runs can be 
made with different values specified by hypothesis. However, if some of the rates 
and auxiliary modifying equations are to express relationships derived by within- 
channel regression analysis, an assumption is be'lng made that these relationships 
will hold for alternative student input mixes. The assumption may well be false 
and therefore inductive of inferential errors. 

What is involved analytically is that the correlations based on a particular 
student input mix (whether or not the correlations involve the student characteristics 
variables) will generally be smaller than those based on the entire student input 
population, or on subsets of that population that include alternative student 
Inputs to a special channel that might be under consideration (e.g., an LEA con- 
^ cerned with students in the local community rather than with the whole state dis- 

ERIC 



38 

tributions). Moreover, the attenuation of correlations from "restriction of 
range" along one or more dimensions of student space will be nonuniform across a 
set of variables, thus distorting the pattern of correlations in a matrix in 
addition to lowering their average value.. This situation distorts all channel 
regressions and regression parameters and distorts the regression techniques for 
controlling channel regressions for differential input. 

Two kinds of formulas exist for "correcting observed correlations for range 
restriction". One is applicable in some MISOE situations to correlations between 
student variables and process, product, or impact variables; the other in soma 
situations to correlations among process, product, or impact variables the variance 
of which has been restricted by student input selection. Each of these formulas 
exist for correcting single correlations for selection on a single variable, and 
in their multivariate generalizations, permit correction of whole correlation 

matrices for restriction on one or more correlated student variables. 

These formulas are presented with references to books by Thorndike and by 
Gulliksen in a set of memoranda attached to this paper as an appendix. It is 

recommended that uncorrected correlations be stored and retrieved, and if correction 

is required for a particular analysis, it can be done prior to entering the cor-' 

relations into regression analysis. The correction requires an ad hoc Fortran 

program, probably not available in commonly used packages. 

Awareness of the assumptions about the nature of the restriction on which 

these formulas are based may guide staff judgements about their use in MISOE. 

Basically the formulas assume that: 

1. restriction was caused by truncation (e.g., applying a cutoff score 
for admission to a channel) on a variable, 

2. this truncation was strictly adhered to, 

3. the raw score slopes (b^, not betas) of the regressions involved were un- 
affected, and 

4. the variables were all measured without error. 



39 



Regarding the last, the correction procedure could be entered with correlations 
corrected for attenuation due to measurement error. The first three as- 
sumptions are plausible in the military situation for recruits assigned to train- 
ing programs (the situation for which the formulas were developed and most fre- 
quently used), but not always valid even then. They are plausible in local 
situations wiiere the educational manager has specified and enforced such cutoffs 
in selection (e.g., a minimum IQ to entcx this program), and where the standard 
deviations are known for the sector of student space served by his jurisdiction. 
Even under ideal conditions where the b^ are unaffected, one may prefer to use 
"variance accounted for" in interpretation and simulation, for reasons discussed 
in Part Hi. • 

More serious are those MISOE situations for which these formulas 
would be of limited or even dubious value, but where the problem remains. 
In one example presented by the staff, the principal interviews the "applicants" 
for a program in his school, and judges the "interest and motivation" of the 
student for that program. In this case, one could probably use the formulas 
(assuming one has an external measurement of the relevant interest) even though the 
exact cutoff and the consistency oi its application may be unknown; the formulas 
require measurement of the effect in terms of a comparison of standard deviations. 
Another example is the situation where analysis is being performed on information 
pooled across schools giving the "same" program, but with variations across schools 
in actual admissions criteria applied to various pools of potential students. 
Partial solutions for such situations are presented in a manorandum on "simulating" 
complex selection, appearing as an appendix to this paper. Under some conditions 
it is even possible to regenerate a normal bivariate scatterplot seriously mutilated 
by complex- selection realities, by iteratfve coeratlons based on the discrepancies 
between pre- and post-mutilat/rmarginal distributions. This possibility is des- 
cribed in another memorandum attached to this paper. The efficacy of these suggestions 



C 



40 

in a practical setting is unknown. 

For some analyses involving product and/or impact correlations (within 
and between spaces), the range restriction corrections may be involved where 
allowance must be made for selective losses due to dropouts. Moreover, for MISOE ' 
to make the kinds of comparisons across occupational, general, and academic educationa 
programs in terms of general educational development requires not only input GED 
taeasurement in student space, but also allowance for the multivariate restriction 
of range implied by differential selection on achievement among these "tracks", 
regardless whether the choices are made "freely" by students or as a result of 
some kind and degree of management intervention. 

Analysis Consider ation s for Noneconomtc Fac^nrs in the Proce«« Sp ... 

Staff delineation of the process space is documented in Occasional Papers 
No. 2 and 4, with an addendum to the latter included in Occasional Paper No. 6. 
Although some comments -on the contributions of papers 2 and 4 were included in 
the two letters sent following the February conference, this section develops 
some issues raised there and m Occasional Paper No. 6. Ih several ways the 
process space, is. the heart of the system and a major source of its complexity. 
It is also the major MISOE subsystem in which economic and noneconomic aspects 
interface, with each other and with implications for developing rate and auxiliary 
modifying functions in dynamic simulation. Proper treatment of this important 
topic Will require later integration of .the concepts currently being developed in 
Occasional Papers No. 7-12. 

The process space involves both description of the climate of learning in 
terms of human, physical, and organizational factors (see Occasional Paper No. 4), 
and that of the content and sequencing of instructional events (units) as organized 
into blocks for each program. Analytic capability must be provided for both kinds 
of process descriptions and their interactions in the vector formulation of a process 



ERIC 



41 

mix. The vector, or subvectors (formed from a subset of the defining variables) 
can, of course, be coded so that student types can be matched with process and 
product mixes. Process information should be obtained, in accordance with the 
sampling design, locally, within a program within a school, so that pooling of 
data oa common variables can be accomplished across schools and programs, locally 
and at regional and state levels. "Interactive" process variables may either be 
directly observed, e,g,, the number of students on a piece of equipment, a human- 
physical combination, or be generated as needed in the form of terms, e,g,, 
the joint occurrence of a teacher characteristic (more experience) and assignment 
to a physical factor (the better equipped of two available laboratories). The 
latter example is likely to occur, but for a better product over all students in 
that program at that school, the more experienced teacher may be better able to 
adapt to less ideal physical arrangements at the same salary and equipment cost. 

In a particular program there may be variations in content and sequencing 
of instructional events from one school to another (or one locale to another). 
Content variations (additions or deletions of particular units or blocks) permit 
investigation of their efficacy in terms of products and impacts. If something 
like 857o of the schools giving a program have the same content structure (blocks 
and units) and 157o have one or more variations, a dlchotomous code can be defined 
permitting regression comparisons over schools. If a program has something like 
50% common structure across schools, it is likely that additional dichotomous 
variables can be defined to tag additional variations across schools in the con- 
tent structure of a program. Sequencing of blocks, or of units within blocks, within 
a program, can be similarly treated. Precisely what is feasible will depend on the 
counts of such content and sequencing variations. Greater flexibility may be ob- 
tained if the process record shows an actual sequence, e.g., 

72/ IS/ 4/ 3/ 

could Indicate that unit 2 is given first, units 1 and 4 are given concurrently 



42 



(S denotes unit given simultaneously with the next unit shown), followed by unit 3. 
Where this is variable for students rotating sequences to maximally utilize 
available equipment, such a sequence code should be posted to the student record 
along with data that indicate that the particular student went to a particular 
school and took a particular program. (Note: this kind of cross-linkage between 
student and program information is crucial for MISOE; some of it is a matter of 
appropriate codes being placed in information records, a format or layout problem; 
some of it is a matter of the addressing in 'the information storage and retrieval 
system. The staff appears to have awareness of this and to be making appropriate 
provisions) . 

Stailar logic and treatments are relevant to variations among students 
and schools in ttoe-spent-per-unit. The example on page 46 of Occasional Paper 
No. 5 in which students move on to the next unit regardless of performance may 
not always be the case for all programs, schools, and levels (and in any case, a 
relevant question is whether the policy is a wise one) . Many of the variables 
descriptive of the general setting and specific instructional climate may be 
indicated by dichotomous coding, permitting easy generation of codes for joint ' 
occurrence of process characteristics. 

Some of the above suggestions imply long data records for the process 
space. They may also imply the need to expand the information storage and re- 
trieval system as delineated in Occasional Paper No. 4, figures 1 and 2. 
Occasional Paper No. 5 Pages 45 - 49 indicates staff progress in keeping this 
system operationally flexible. It may be necessary for the staff to prepare 
a document fully delineating this systeia in the light of Occasional 
Papers No. 7-12. 

The notion of using a standard form for collecting process information 
for each program is an excellent one, and the indication on the form of the 



43 

storage-retrieval (or other)codes should facilitate carrying document information 
into computer storage. The idea of assigning a process mix number is also useful, 
but it should be noted that a total process mix may contain subvectors or submixes 
of frequent and selective interest. The only question is how far to carry this. 
Perhaps one submix code would be for the human-physical-organization factors, or 
one code for each of its three component submixes, and another would be for the 
content atid sequencing information. Each submix, however defined, should have Its 
own ID number. 

Units of analysis conducted within process space are likely to be schools 
having a given program. In some analyses comparisons of selected climate infor- 
mation across programs within a large school may be desired. This is in contrast 
to student space \rfiere students are" likely to be the unit of analysis, and in 
product and impact spaces \rfiere either students, schools, or programs may be the 
analysis units. A given analysis across MISOE subsystems will have to deal with 
this. A particular inquiry will have to be judged as primarily focused on answering 
questions about what happens to scudeitts or in terms of what; processes are under 
st^dv* In the first case, students and/or student types are followed through the 
system and for each the appropriate process mix is retrieved and merged with stu- 
dent input and output data. In the second case, the data for students entering 
and leaving a process are averaged over those entering a particular process mix 
(in terms of schools, programs, e.g.,), retrieved and merged with similar aver- 
aged output information. One will usually obtain higher multiple correlations 
in the second case, but the real question is \rfiich is appropriate for a given 
Inquiry (or sub inquiry): are we concerned with what processes do to individuals , 

with what processes do^in the aggregate > for society? Overall, both, but 
not within the same specific analysis. The same question applies in dynamic 
space and must be answered the same way where regression information is to inter- 
face and provide input information or simulation. Even if they can be mixed in 
O „ simulation, separate regressions by units of analysis will be required. 



44 

Occasional Paper No. 5, Page 48, proposes to post the weighting of each 
process variable in the mix in ^ich it occurs and the average weight of a variable 
over all mixes as part of the storage and retrieval of process mix data. This 
hardly seems realistic as a solution to a real problem. In general, it is incon- 
sistent with the flexibility requirement. More specifically, it ignores the 
multiplicity of weights a given variable can receive within a single mix depending 
on the regression analysis in which a weight (b^ or partitioned variance) was 
estimated. The same variable may have quite different weights for various product 
and impact mixes and submixes, over different analysis units and aggregations 
thereof. The problem would be compounded when trying to average the weights a 
variable "receives across process mixes; this is a dubious practice anyway, instead 
of recomputing them from aggregate correlation matrices. . 

There may be much more homogeneity of regression in the system than tlie 
pbove criticism assumes, but we don't know this. The homogeneity issue can be 
answered by special application of the Bottenberg-Ward procedure or by hierarchical 
grouping of regression equations. It is also possible to group or cluster mixes 
within process space without reference to IPPI relations. 

This wiM not weight the process variables, but merely classify mixes without 
direct InterfaceabI llty with other MISOE components. It may, however, be useful 
for organizing a listing cf mixes with or without their associated cost and weight 
Information. It may require a separate Information storage and retrieval section 
with separate but cross-linked addresses for c,toring the costing and weighting infor- 
mation. 



ERIC 



45 



Analysts Involving the Product Space 

From an analytic viewpoint, the product space, like Janus, faces in two 
opposite directions. Students within programs have been "processed" and come 
off of the pipeline as "program completors". Data about them may constitute the 
initial set of dependent variables in the analysis of process; impact variables 
constitute a later set. But the product variables indicate the educational 
managers'' assumption that product quality is related to impact on societally de- 
fined goals. This assunption is validated using product variables as independent 
variables in the prediction of impact, i,e,^ product-Impact analysis* The two 
major purposes of product data within MISOE as delineated on Page 79 of Occasional 
Paper No, 5 will be served primarily by process-product analysis. However, both 
need input controls ("by student type" and/or variance controls for differential 
input), and the second is further served by procesR- impact analysis. 

Occasional Paper No. 5 distinguishes gross and specific tjrpes of product 
data. Gross data, such as the number of students completing a program would be 
obtained for all educational sectors (OE, Academic, and general). Similar gross 
data should also be obtained for each program about the number of dropouts . More- 
over, the student records should clearly indicate for each student not only the 
program entered, but his con?)letiori-dropout status. Some so-called "dropouts" 
may actually be transfers to another program or even to another sector, where 
they may or may not become "completors". The data system should be able to re- 
flect this reality. 

Occasional Paper No. 5 indicates specific product data will be obtained 
for completors of occupational programs in cognitive, psychomotor, and affective 
(primary attitudinal) areas. Consideration should also be given to obtaining 
some affective data on dropouts and transfers since this may relate to later 



46 

enployment status and other impact data. Similar reasoning suggests obtaining 
some affective data in the non-OE sectors in addition to the GED information 
already planned. 

' This kind of thinking implies legitimately missing data for dropouts on 
some of the process and product data, because such data are not applicable. The 
earlier discussion of missing data refers to data that should be present and 
usually is, but is not obtained for some observation units. In the present case, 
where process-product data are missing due to noncompletion of a program, no re- 
placement values should be cbmputad. Care must be taken in regression and other 
correlational analysis of process-product, process-impact, or product-impact rela- 
tions to perform the analyses on. completors only, dropouts only, or if across all in- 
puts, to use process and product variables obtained on both completors and noncompleto 
Rational management decisions about which students should enter which programs 
cannot rely solely on gross counts of completors by student type. The specific 
product data, by student type, is also relevant to this kind of a decision. It 
is the kind of question that requires the combined application of taxonomic and 
discrimination models. One could classify students into types within student space 
and carry student mixes through process-product-impact analyses. The taxonomic 
nuclei could be defined on a random or self-weighted sample of the students in 
the general sample space, assigning all other students by the personnel assign- 
ment algorithms. It would probably be better, however, to define output groups 
based on product and impact data, and to use student data to classify students 
in terms of such output groups, weighting the student data to maximally discrim- 
inate the groups and "assigning" new student cohorts to programs in which they 
will have maximum likelihood of achieving high product scores and be most likely 
to have favorable societal Impacts. 

Occasional Paper No» 5 notes that the educational' process must be flexible 
so that improvement is possible, and that product objectives must not be over- 
preacrlbed. This is not envisioned to encourage vagueness in specifying objectives, 



47 



but to allow objectives to be added or deleted in a program over time, and to 
allow variations across schools in specifying objectives. This implies the same 
kind of analytic flexibility in product space as was discussed earlier for pro- 
cess space. Provision must be made not only to obtain product data on unique 
objectives but to ensure storage and retrieval linkage between unique processes 
and products. Moreover, the fact that a product datum refers to a unique ob- 
jective and the process for achieving it needs to be so tagged to ascertain in 
product-impact analysis whatever unique contributions to impact such unique pro- 
ducts :aay have. Althougli primary reference here is specific to product data 
related to a unique objective, the gross counts of numbers of students complet- 
ing a unique objective should be obtained. Moreover, it is well to keep in mind, 
for both common and unique objectives, the possibility that the process associated 
with a particular objective may affect other products in addition to the one for 
which it was promul<^ated. 

In some programs uaiiqueness may be introduced in schools not in the sample. 
As soon as possible after this occurs, consideration should be given to including 
the school in the sample at the next cohort replacement time for the program. 
Some weight adjustments are Implied and the feasibility of this notion depends 
on the frequency of the occurrence of unique changes, presumably knowable from 
census data on programs. 

Each of the three specific data types: cognitive, affective, and psychomotor 
performance have some measurement and analysis Ijnplications. As one possibility 
for system expansion, it might be useful to obtain product data not only from 
program staff and program completors, but also from employers or supervisors of 
those students who were on work-study process plans. Such data might be obtained 
in the form of a set of rating scales on all three performance areas, and ml^t 
well have some predictive validity for certain impact measurements (e.g., employed 
vs. unemployed; on-the-"job performance ratings in Impact space). 



48 

Cognitive measurements by pencil and paper tests will result in distri- 
butions of test scores* Conversion to pass- fail in terms of some cutoff specified 
by a cognitive objective loses considerable and useful analytic information. It 
will contribute to flexibility to have both the "continuous" and dichotomized 
test scores available: the former for regression analyses across spaces, the 
latter as an elaboration of the gros« accountability data. 

It may be instructive to note how the comparison of general educational 

development across sectors might be formulated in regi^essiou terms • The "full" 

mo(2el is defined as: 

Y «= GED product score, the dependent variable 
X- «= Input 6ED score for input control 





Dlchotomous 


score for taking Academic 


tract 




Dichotomous 


score for taking General 


tract 




Dichotomous 


score for takinp, OE tract 


; program 1, mix 1 




Dichotomous 


score; progjram i, mix 2 




Dichatomous 


score; program 2, mix 1 






Dichotomous 


score; projpram 2, mix 2 






h h 


















Input scores by tracts 




Y 

■11 








Y 

*12 








^13 


"^1 hJ 







It can be sijDq)lified or expanded by the gross vs. fine attention to programs and 
mixes within occupational education. The first seven vectors might be expected 
to be retrievable from observed and stored information. The first vector, frc«n 
student space, is included primarily to permit the generation of product vectors, 
X Even without such product vectors, the inclusion of Xj^ in both full and 

reduced models will give an overall control of a test of some hypothesis involving 
the other vectors. Vectors X« - indicate which students went through which edu- 
O cational chacuel. The zero-order validity coefficients (r ) for Vectors X,^^ 

ERIC ^ 



A9 

correspond to uncontrolled t-tests for contrasting a particular chanael against 
all others with respect to the GED product score. Reduced models containing 
two or more of these vectors permit tests of contrasts between the pooled channels 
retained and those dropped from the full :3odel^ Thus, all the OE channels can 
be pooled and contrasted with non-OE channels. The vectors, Xg^^^* P^^^it tests 
of homogeneity of regression among two or more channels. This rather special 
and simplified example illustrates the power of the regression approach to handle 
whole series of ANOVA, ANCOVA, and regression homogeneity problems from one 
formulated "full" model. The references cited in Part Three must be consulted 
for further details and for a wider range of examples of the power of the regres- 
sion, model. 

The affective data consist primarily of Likert and Semantic Differential 
scales of attitudes toward self and work. Retests of some personality measure- 
ments (e.g., authoritarianisn) pretested in student space may also be helpful, since 
personality changes, whether or not attributable to the educational experience, may 
be predictive of impact variables. The proposal on Irage 88 of Occasional Paper 
No. 5 to treat affective data separately from psychomotor and cognitive data is 
reasonable \ohen product variables are to be dependent variables in process vali- 
dation* When, however, they are used to predict impact variables, it is quite 
feasible to combine the three types of product data in analysis and this permits 
the examination of possible interactions among product data types in predicting 
topacts. Affective data on dropouts and transfers may also be important informa- 
tion to obtain. 

Affective objectives (and therefore data) reference programs, but not 
blocks and units. Stipulation of the objectives ••within department faculties" 
iji5>lies possible variations across schools with the kind of uniqueness problems 
discussed earlier (with similar treatment recommendations). Input control is 
just as important in analysis of affective data - perhaps more so - as in analysi$ 



50 

of cognitive data- The discussion of Figure 9 in Occasional Paper No. 5 (Page 90) 
ignores differential input and leads to an inference that school A fosters better 
attitudes than school B, which may be true, but may also be an Inferential error. 

The performance objectives, largely ps; choraotor in conception and measure- 
ment, probably involve cognitive and affective con?)onents# The use of pass-fail 
measurement of achievement of specific objectives in this domain is quite reasonable* 
The objectives are like items on a test with item scores determined by one or more 
scorers (raters) observing performance^ either directly or by reference to video 
tapes* The inter-rater reliability of determining pass-fail on a specific objec- 
tive is a function of the number of raters; that of a single rater might be as 
low as .SO. The Spearman- Brown formula can be used to estimate the reliability 
of pooled judgements for a given number of raters. If a single rater reliability 
is .30, that for two independent raters is about .46; for three, about .56, and 
for- four, about .63. The use of video tape and discussion between discrepant 
raters should in^rove the reliability of the ratings (not necessarily their 
validity) and hence, permit the use of fewer judges. Inter-rater reliability 
can be used to correct correlations of ratings with other variables for attenua- 
tion from measurement error* 

If the objectives are scalable, the scale scores can be used in analysis. 
In the case of Guttman scaling, reliability of the scaled scores comes out of the 
scaling process itself. For a set of objectives to form a true Guttman scale, 
th^ must form a unit-rank correlation matrix,, i.e., reference a single conxnon 
factor. This univocal feature of such a scale provides a score with a very 
narrow band-width with the usual advantage of a clear meaning of what is measured, 
but the disadvantage that the scale will have slim chances of correlating very 
highly with external variables. For this reason, one doesn't often hear of the 
extensive development and use of Guttman scales in large-scale practical programs. 
Nevertheless, this approach is well worth trying for MISOE. It is more likely 
that performance objectives within programs will form one or more "quasi-scales" 



51 

in the Guttnian sense, and these should be quite useful if reliabilities can be 
inaintained in the .75-. 95 range. It s-ay be helpful to generate a jnatrix of phi 
coefficients (phi/phi^ if objectives vary considerably in difficulty, i.e., 
percent passing), and perform an informal clustering of performance objectives 
to ascertain idiich sets are likely to scale. 

The staff recognizes the fact that some objectives will not scale and pro- 
poses a procedure (Figure 7, Occasional Paper No. 5) for assigning unique numbers 
to patterns of achieved objectives. This can be done viiether the objectives are 
scalable or not. The procedure ensures a unique number will be assigned to each 
possible "mix". These numbers, like those on the jerseys of football players 
are nominal , they tag the patterns, but do not scale them. The item numbers 
should not be used analytically, but the presence or absence of each pattern 
indicated for each student as a dichotomous variable. The pattern number could 
be xihat is stored so that the dichotomous vectors can be readily generated as 
needed for analysis. 

The utility of product data to management is defined on Page 79 of Occasional 
Paper No. 5 in terms of maximum product for given cost and/or least cost process 
to achieve specified products. These questions involve the integration of economic 
analysis with noneconomic interspace analysis, an integration possible when papers 
7-12 have been completed. Dynamic simulation should, be useful for resolving 
nanagement alternatives, given status quo simulation followed by runs in which 
product levels and costs are changed in search of optimizing combinations. 
Analysis Involving the Impact Space 

The variables of the impact space are to indicate societal values, societal 
action goals to realize those values, and to constitute the ultimate criterion 
space for management p.licy evaluation and decision making. Although aggregations 
of these data over educational and noneducational sectors, and over 
schools and programs, are of direct Importance, the actual impact data for each 



ERIC 



52 

individual will be needed and identified as such for static interspace analyses* 
Aggregated impacts will be of maximum interest to legislators and state level 
managers. Interactions among impacts (mutually enhancing or constralnfna thfim 
are virtually ignored in present planning but may be of interest as MISOE expands. 
Aggregated impact information will probably be critical level variables in dynamic 
simulation and, indeed, certain kinds of dynamic interactions among impact levels 
can be hypothesized and included in formulating dynamic space flow diagrams. 
In a narrow sense, impact space measures the benefits in cost-benefit 
relations. More broadly, impact is often economic, too, in that there will b-e 
interest in economic benefits, both societal and personal, A further delineation 
of this view is part of the anticipated integration of economic and noneconomic 
considerations. 

At the stage of formulation of management inquiry and translation into 
analytic operations, direct interaction between representatives of legislators 
or managers, and MISOE personnel is anticipated to be necessary. Even with 
extensive education of inquirers by MISOE staff over a period of time, it Is unlikely 
that interrogation of the system can be confined to inquirer manipulation of jcemote 
cocputer terminals. Certainly that is a useful part of MISOE: but the computer 
cannot generate the interim decisions between problem formulation and analysis; 
choice of relevant data, selection and ordered application of appropriate models, 
algorithms, and interpretation of analytic results. The computer will neither 
formulate ths flow models nor write the model equations for dynamic simulation. 

Much of the initial interaction between MISOE staff and inquirers will 
involve efforts to get exact specification of the problem at hand in terms of: 

1, level of application (i<»e., subpopulation referenced by the 
inquiry) , 

2, what is to be optimized, or other goal of the inquiry, 

3, what is an acceptable solution. 



53 



4. the. time by which a goal is to be achieved, and 

5. in the case of multiple, related goals, what priorities 
are assigned to their achievement. 

The discussion of impact space in Occasional paper No* 5 shows staff awareness 
of these issues. 

The possibility that relative priorities for multiple goals may be under 
consideration implies for static space analysis that weights (probably ratings 
or rankings) of the relative importance of impact goals be appendable 
to impact data. When piredicting a single impact, they will not be needed, but 
in formulating and predicting an impact mix, such capability should be selectively 
and flexibly available. Note that different inquirers may have different priori- 
ties. This means that priority weights should not be appended to data within 
the information storage and retrieval system, but be flexibly introducible co- 
jointly with retrieved data in applying an analysis model. Note, too, that such 
weights are in addition to any sampling weights which must be appended to the 
data. 

It is recognized that impact data may come from several sources including 
gross summary data from other agencies, and more specific followup of "alumni" from 
academic, general, and occupational education channels, including dropouts* In addi- 
tion to employment and citizenship information, it would be desirable to ascertain 
the location and mobility of former studentSfboth substantively and as an aid to 
updating name and address files. Impact mixes can be formulated as person vectors 
on impact scores, and at aggregate levels, mean vectors appended to aggregate 
vectors of data from other sources. These two kinds of impact mixes that can be 
combined at more aggregated levels correspond to the direct-personal vs. indirect- 
societal dimension of impact classification shown in Figure 2 of Occasional Paper 
No. 5* The immediate impacts are more closely associated with process and pro- 
duct data and to the transition between pre- and post-impact distributions, but 
can be treated analytically in the same way as the longer range impacts developed 



54 

from fol lowup data. 

From an analytic viewpoint the process of setting societal and related 
impact goals is "a given''^ underlying the problem for analysis. The process of 
goal -setting is described for analytic purposes quite adequately in Occasional 
t^aper No. 5, at least for anticipated impacts. Insofar as unanticipated impacts 
(i.e., those not related to specific educational goals) are in fact anticipated 
as possible conditions for former students, and are measured, they pose no serious 
problem for static analysis. Their levels, however, may be important to include 
in dynamic simulation models. 

The educational pre- impact spac^ is defined as that for storage of "existing" 
levels and rates (i.e., ratios, not necessarily rates in the dynamic simulation 
sense) of impact variabJes. The "initial data" collected on impact variables 
from previously "processed" students and from other sources can be stored here. 
As a cohort in each program moves out of educational space, their impact data 
and concurrently updated data from other sources can be placed in post-impact space. 
As the next cohort moves out of educational space, impact data on the previous 
cohort can be moved to pre-impact space as "existing" data, replacing the "initial 
data" and making room for the post-impact data on the new cohort. This will 
automatically preserve and update the distinction between pre- and post-impact 
spaces as MISOE operates over time. It may be desirable to retain cohort data re- 
tired from pre-fnput space externally on tape for archival reference, If necessary. 
In any case, the impact data time and cohort must be clearly Indicated in the in- 
formation storage and retrieval system. 

Many analytic considerations involving impact data have been discussed in 
earlier sections of this paper, because analysis of other spaces will often involve 
the impact data. There remains the issue that certain post-impacts will be credited 
to (or blamed on) educational processes and products, even with input controls, 
where the observed impact may be the result of ongoing cultural and economic pro- 



55 

cesses. The effects could occur between program completion and long-range 
impact measurement, and differentially by the locale in which a completer 
lives. No attempt will be made here to cope with the potential risk of inferen- 
tial error so introduced, nor should the staff be unduly concerned (although 
aware) with coping with it in early development and implementation of MISOE. 
Changes observed in the noneducational control group will provide some clues to 
the nature and extent of the problem and expanded MISOE can be designed to cope 
with it, if necessary., 

The Educational Human Input and Student Spaces 

The staff has deemed it convenient to distinguish a societal resources 
space from the original input space, now called educational resources space (Figure 
1, Occasional Paper No. 5). Each is subdivided into the human and economic 
components: the educational human input space and the student space, respectively. 
Since essentially the same variables and observation units (those who become 
students) give rise to similar aggregations and mix formulations, the two spaces 
may be considered as one for most analytic purposes. Nevertheless, the interest 
of state level managers will focus on the educational human inputs, implying 
emphasis therein cn the aggregated data over all educational sectors, schools, 
and programs, and over certain demographically defined subraixes. Initial emphasis 
will be on status quo distributions and a priori grouping of mixes. One may also 
anticipate that some demographic, ability, and achievement measurement at late 
primary school levels could be involved. If such a distinction between educational 
human input and student input information is contemplated, the information storage 
and retrieval system should reflect it along with the capability of tagging at 
the individual level who enters which educational channel. 

The- student input information is required to characterize the sorting out 
of students into channels, whether this tracking is accomplished via student 
choice, via administrative control, or a combination of the two. As part of the 



ERJC process data, any cutoffs for entry to a program should be posted for possible 



56 



use as constants in rate equations in dynamic simulation. The student input 
Information is also required to control analysis validity for differential input 
as discussed earlier. An analytical question arises with regard to input con- 
trol through methods discussed in Parf: Three vs. conducting analysis within 
"student types". The more finely defined the student types, the less need there 
will be for the variance controls on input, but also the smaller the number of 
observation units on which the analysis can be based. Analysis within moderately 
gross classifications of students should use the variance controls for within- 
group heterogeneity. 

The full pattern of characteristics and descriptions for each student, 
including his educational channel, constitutes a total mix from which a variety 
of submixes should be flexibly derivable for different purposes. For purposes 
of some state level policy makers, mixes of demographic, ability, and achievement 
measures may often be all that is needed; larger submixes may be needed (including 
personality data) for regional policy makers, "program directo-s", and for analy- 
tical controls. Aggregation and classification of mixes for state level analysis 
may reasonably be rather gross and may involve categories along continu'ous dimen- 
sions (e.g., high, medium, low IQ) . In the early implementation of MISOE the 
classification of "student types" for educational human input may be a priori, 
to be replaced by a more sophisticated taxonomy based on IPPX relationships to 
be developed as early cohorts go through. 

The taxonomy of student space involves so many students and so many vari- 
ables that some a priori grouping in terms of educational sectors may be helpful. 
If hierarchical grouping is used to define the types, the grouping should be based 
on a distance matrix with the objective function of maximizing the ratio of the 
among.groups sums of squares of distances to the within-groups sums of squares 
of distances. The grouping is apparently insensitive to whether D or is used, 
or whether distances are computed on the score vectors (Cronbach's or on their 
ERXC P'^^"*^^?^^ components (Mahalanobis ' D^), 



57 



Analysis Across IPPI Spaces 

Most analyses of any substantive value beyond distributional description 
and aggregations on particular variables and mixes will be across the IPPI spaces. 
Because so many variables are involved for which interaction vectors may be 
meaningfully generated, regression analysis across spaces should first be per- , 
formed so that the stepwise algorithm can reduce the number of relevant variables. 
Then, plausible interaction vectors can be generated involving the selected 
variables in a more manageable "full model" regression analysis, and appropriate 
"reduced models" developed to test particular hypotheses. It is likely that 
"full models" with pertinent interaction terms will procure the kinds of para- 
meters needed in dynamic simulation. There would seem to be no a priori reason 
why econonic and noneconomic data could not be included in regression analysis, 
thus providing additional clues to formulating and interrelating these two types 
of information in simulation modeling. There remains, however, a need to clarify 
the contrast between weighting economic data by regression and doing so by the 
Koopmans structural equation in which the estimated parameters are not regression 
weights, but are "elasticity coefficients". (See van de Greer, "Introduction to 
Multivariate Analysis for the Social Sciences", W, H. Freeman and Company, 1971.) 
It may be that including economic data in regression analysis will be useful to 
help identify the important variables, whose levels need to be included in sim- 
ulation models, but to use elasticity coefficients, rather than regression co- • 
efficients in rate equations. The two sets of weights are contrasted by the 
different optimizing functions defining their estimation. (To add to the confusion, 
both sets of wei^ts are called b-weights.) 

Much has been indicated in the earlier discussions of the MISOE capabilities, 
subsystems, and analysis controls, for inferential error bearing on static analysis 
across subsystems. Such analysis presumes the interconnectability of data across 



58 



subsystems, ensured by logistics of data collection, tagging, and cross-ref erence- 
ability In the information storage and retrieval system. Need for inter- 
connectabi I ity applies to noncompletors as well as completers, and is further re- 
quired for dynamic simulation as wel! as In static analysis. 

Analysis performed in support of overall agency management decisions will 
generally be rather gross and limited by the kinds and quality of information 
available from other agencies. Analysis performed in support of management 
decisions over all education will involve both gross and specific data. Compara- 
tive analysis in terms of data types common to academic, general, and occupational 
education (i.e., input, product, and output) will be required. Analyses performed 
In support of particular program management, for LEA's, and for general occupational 
education management will be more specific and detailed with input-process-product- 
impact forms. It is anticipated that these remarks will apply to both static 
and dynamic analyses, 

A£^ soon ai; possible with available interfaced data*, full model regressions 
shoi;ld be^^set up and completed, rather than waiting for specific inquiries, so 
that Information from the regression analyses can be used to identify important 
parameters. This information will also aid the staff in its further development 
of MISOE, in its interaction with managers in formulating inquiries to the system, 
and in formulating simulation models. 

This completes the discussion of noneconomic, descriptive analyses in 
static space. The next pare turns to the topic of simulation, especially in 
dynamic space, but will also include certain alternative considerations for analysis. 



59 

Part Five. Simulation Models 

I ntroductlon 

This part considers the extensive staff couur.itraent to simulation as a 
major analytic aspect of MISOE. The comnitment, as expressed in Occasional paper 
No. 3 and 6, is focused on the Forrester type niodels for dynamic simulation. Such 
models do, indeed, have a serious claim to be useful for MISOE, but certain 
limitations and possibly unknown characteristics of such models and their atten- 
dant applications in the MISOE context suggest an excessive staff emphasis 
thereon. Moreover, some types of inquiries can be anticipated for which static 
analysis or linear programming would be indicated and adequate. 

For MISOE to have the general and flexible capability envisioned for the 
system, serious staff consideration should be given to static simulation, to 
linear programming, and to other kinds of solutions for some kinds of problems, 
and- to other possible capabilities discussed in the extensive literature of 
operations research and of econometrics. To be sure, much of this additional 
capability, including the necessary software, will be available for economic 
analysis, but the point is that these tools may be also used for noneconomic 
analysis and for analysis which combines the economic and noneconomic concerns. 
It is in problems with strong nonrecursive, nonlinear, and temporal flow features 
where dynamic simulation will be most clearly indicated. 

It may be instructive to consider an example of an inquiry which does not 
require dynamic simulation for its solution. The manager of occupational educa- 
tion wants to increase the quality of automechanics without changing the numbers 
or kinds of students entering the automechanics program. He conceives his pro- 
blem as. one of increasing the number of students with a Guttman product mix of 
"5". How is he to do this? MISOE approaches the problem by doing a stepwise 
regression analysis using the Guttman product mix score as the dependent variable, 
Q controlling for input, and using process data as independent variables. The 

ERIC 



60 



students in all schools in the sample having automechanics programs are the 
Observation units. 

For simplicity in a didactic example, suppose two process variables 
were shown to predict the product variable: X^, the square feet of floor space 
in the automechanics laboratory attended by the student, with a moderately 
positive regression weight; and, Xg, the number of studsnts on an engine, with 
a larger, but negative regression weight. At this point, we know the critical 
variables for the manager to manipulate and, more specifically, that he will get 
more students with high product scores if he provides more engines in the labora- 
tories, or if he provides more floor space (presumably so students working on 
adjacent engines are not bumping into each other). It also appears that pro^ 
vlding more engines will be more effective in the quality of the student product 
than providing more floor space. 

While helpful, this is quite inadequate. The formulation of the original 
inquiry was vague with respect to the nature or extent of the increase in product 
••5" students. Nor was any cost constraint imposed. Note,' to6, that one of the 
variables that •'made a difference" (floor space) constrains the other, i.e., 
you can provide additional engines if you have enough floor space for them. It 
is doubtful that the manager would have thouglit of the latter until the "important" 
variables had been identified by the regression analysis. Even if he had defined 
the problem as getting a specified increase in the number of product "5" students 
for least cost, MISOE would now know which economic variables were most relevant 
(e.g., engine costs, costs of adding wings to school buildings, etc.). With such 
a least cost formulation and clues to the relevant variables, MISOE might recog- 
nise this as a linear programming problem, to be solved with the simplex algorithm, 
minimizing the total cost under the constraint that "the number of square feefc of 
floor space for a given number of engines is more than some specified constant. 



T 



61 



The solution (if it exists) would specify the optimal levels of and X^ 



In the sense defined. 



If the manager's inquiry were purely exploratory about the change in 
product mix distribution, preparatory to asking the cost of a specified increased 
product, MISOE could provide distributions of predicted product scores under 
status quo and under manipulated changes in and X2. The distributions would 
be grouped by cutting scores defined by equicentile conversion against the actual 
status quo distribution (to allow for the regression effect) or by converting 
status quo predicted scores into stanine form, in effect, this would be a kind 
of static simulation of the effects of manipulated changes in and X2 on the 
product distribution. 

Dynamic simulation would be required if the manager's inquiry were made 
at a more sophisticated level. For example, the manager might specify that 
certain changes were to be made and might want to know how long it would take to 
reach the output distribution sought; or he might have to weight his decisions 
about automechanics in a context of similar decisions in other programs, or in 
regard to other outcomes for automechanics. It is also quite conceivable that 
some "alumni" from the automechanics progi-ams become supervisors of future 
students either as teachers in occupational education or as supervisors in work- 
study programs and that process information turns up "important". With such 
temporal, mutually constraining, or feedback complications as these, dynamic 
simulation might well be necessary. 

Before passing to a more detailed consideration of dynamic simulation, 
the attention of the staff is called to a linear programming approach to "assigning 
personnel to jobs". The logic of formulation and analysis is quite 'general so 



ERIC 



In this case, it is rather obvious without such. analysis that providing 
^^fn 5^"^^ ^^^^"2 wings to school buildings; the manager 

would do so up to the present floor space limitations. Such would not be the case, 
generally, with more variables, or with costs positively related to "importance" 



62 

that "counseling" or "classification" can be substituted for "assigning" and 
"training channels" or "programs" can be substituted for "jobs". Usually, some product 
or productivity measure is maximized rather than costs minimized. Some useful 
references to such analysis, potentially useful to MISOE in dealing with inquiries 
about matching student mixes to programs, are: 

••Methods of Solving Some Personnel-Classification Problems", D, F, Votaw, Jr, 

Psychometrika ^ 17, No, 3, 1952, 
"Assignment of Personnel to Jobs", D. F. Votaw, Jr, and John T. Dailey, 

Research Bulletin 52-24 > Air Force Personnel Laboratory, Lackland 

AFB, Texas, August, 1952 • 
••An Approximation Method of Solving the Personnel Assignment Problem^^, D. F, 

Votaw, Jr., and John M. Leiman, Technical Memorandum 5 6-14 > Air Force 

Personnel Laboratory, Lackland AFB, Texas, July, 1956. 
In addition to these references, a mimeo paper, •'The Counseling-Assignment Problem^^, 
by Joe H. Ward, Jr. at the Personnel Laboratory at Lackland, and a Master's thesis 
by Donald Fink, presumably available from the Engineering Science School or library 
at Johns Hopkins University, are relevant. Most of this literature was developed 
for the Air Force and its personnel and training problems; there, matching person- 
nel with programs involves quotas to be filled and constraints on the number of 
training slots available. 
General Consideration of Dynamic Simulation 

The next few sections will discuss various issues concerning dsmamic 
simulation using Forrester type of formulation and the Dynamo capability. In 
this section we consider some general features of dynamic simulation with emphasis 
on kinds of models and on the flowcharting formulation of models. In subsequent 
sections, we consider equations and '^ata sources, .inferential errors, and other 
issues that have arisen in staff discussion. 
y^J^^ Dynamic simulation models may be either general (gross) or specific (fine). 

yyc 

.The .obj ec tiye. func tip n.. could. >be> .to. maxtmize the produc tivity-over cost ratit. 



63 

This distinction may refer either to the order of the model (number of levels or 
rates) or to the magnitude of the time units. Inquiries from a state level 
manager over agencies or over all education are likely to involve relatively 
gross models, at least initially. More elaborate, finc«structured models may 
be involved, however, at those levels as managers become more aware of the 
importance of details in subsystems. Somewhat finer simulation models may be 
anticipated to answer inqui-^-ies from regional managers over educational sectors 
and programs. Model complexity is obvioui>];y a function of the number of conserva- 
tive subsystems included, such as those dealii g with programs, personnel, or costs. 

Discussions with the staff revealed an expectation tha'c some dynamic 
simulation models could be predesigned for call with specified parameters. In 
so far as this is practical, i.e., certain completely general and specif icable 
models can be developed for clearly anticipated general forms of inquiries, the 
notion is an attractive one. It would seem more likely that variations in the 
detailed nature of the inquiries received by MISOE will imply variations in the 
details of the simulation flow charts and equations. If this is correct, much 
greater flexibility will be needed in formulating dynamic simulation models than 
one would have from a small set of prepackaged models. The latter in flow chart 
form may be initially helpful as a communication device with managers, and as a 
nucleus chart for the staff to elaborate in formulating specific models for 
specific inquiries. 

As the staff accumulates experience in designing models for answering 
specific inquiries, portions of these models may be used as modules, which can 
be put together in various ways to form the initial flow charts for future inquiries. 
In this approach the same level and rate equations may be used in the new models 
wherever those levels and rates are not changed by their connections to other 
levels and rates.* Some types of modules that may thus develop over time, and 
may be repeatedly used include student flow subsystems, economic allocation suL 




*This suggests that the user-defined MACROS in DYNAMO may be useful to MISOE • 



-5F 



64 

systems, and certain kinds of Information loops. It Is likely that different 
modules developed In connection with an inquiry from some level of management will 
be most useful for inquiries coming from the same management level. How feasible or 
how helpful such a modular approach may be in MISOE requires further consideration 
and» possibly, actual operating experience « It is suggested here as a compromise 
between having a repertoire of a few general models and having to derive ad hoc 
models from scratch for every inquiry. 

Simulation runs with a given model may be classified as runs of the status 
quo, or as runs involving promulgated changes. Status quo runs should be made 
with any general model which may be feasible; it is likely that they will be the 
first runs with any model, to establish the behavior of the system as a base for 
con5)arlng the results of any changes. Most changes may be expected to imply 
changes in parameter cards initiating levels and constants* without requiring any 
changes in the flow charts. It is conceivable, however, that a sophisticated 
manager may promulgate changes \Ai±ch will change the flow charts (e.g., he may 
decide he wants to use additional available information to* Influence one of the 
rates). Thus, the interaction between a. manager and MISOE may stimulate his 
thinking after he has seen the results of status quo simulation, or of runs 
reflecting simpler changes. 
Equations and Data Sources 

This section considers the levels, rates, and rate modifiers in dynamic 
simulation. Tor each we consider how they enter a flow chart, how corresponding 
equations are formulated, and what kinds of data are to be retrieved from the in- 
formation system. Some potential uses of DYNAMO functions will also be considered. 

The level variables in a flow chart will come first of all from the concerns 
expressed in an inquiry. Where these involve dlffere- ^ kinds of data (in terms 
of observation units, or economic vs. noneconomlc data types), the flow chart 
should reflect these as nucleus levels for different subsystems, connectable 
vithin subsystems with level variables of the same kind, and connectable between 
r»i«K0vp«-Ai^o nt-r^Vj iryfryj^af-tr^n 1 4 nV« '^he .HMcl e!!!? !<*vaXi? Within « .<iubsvRtein should 



65 

then be supplemented with level variables that are inputs and outputs for each 
nucleus level. For every level variable, all of what flows in and out of that 
level should appear either as other level variables in that subsystem, or as 
sources and sinks. Moreover, if a level variable consists of several categories 
(e.g., the number of good citizens includes those entering, already in, and 
leaving an educational channel), and if it is necessary to keep track of such 
categories, one should ensure that all categories are accounted for in the 
definitions of levels with appropriate inputs and outputs, and that categories 
selected for e^iplicit attention be mutually exclusive and without direct flow 
between them. Violations of these principles may lead to awkward or even in- 
accurate flow charts. 

Values for the level variables will typically be frequency counts, or 
averages over some defined aggregation, ratios, or probabilities. Ratios are 
often called "rates" whether or not they are with respect to" time. Those not 
with respect to time must be either levels or constants in dynamic simulation; 
those with respect to time may be levels, constants, or rates in the sense the latter 
term is used in dynamic simulation. Probabilities are ratios, expressing lative 
frequencies. Initial values of level variables for a simulation run are set. by 
type N equations punched into DYNAMO control card. There must be a level equation 
for every level in the system (except for sources and sinks). Each equation will 
be of the form: 

L Level. K Level. J + DT (Sum of allRATES.JK controlling flow 

into the level minus the sum of all RATES. JK controlling 
flow out- of the level) 
There must be as many rate terms In the parenthesis as there are input and output 
channels to and from the level. DT is the simulation time unit, not the 
units of time in which rates are measured. Such units (including that of DT) must 
be consistent, conversion constants being used to ensure the consistency. All 

ERLC 



66 



sources and sinks are indefinitely large and unspecified levels, but should appear 
in the flow diagram, connected to definite, specified levels by rate symbols and 
rate equations. 

A rate symbol must appear between any two level variables in a flow 
diagram; a given level variable will have as many rate symbols attached as there 
are levels in direct connection with the given level. The rate values are not 
given directly to the computer as such, but are supplied through the rate equations 
and their modifiers; these, in turn, maj. be constants sfating the rates directly, 
but usually are not, because the rates will not, in general, be constant through- 
out a simulation run but modified by the dynamics of the system. AH factors, 
levels, constants, or other rates whxch can affect a given rate must be connected 
to the symbol for that rate by information lines in the flow diagram. Failure 
to do so may lead to confusion and to incomplete rate equations. 

The formulation of rate equations poses the greatest challenge to the 
analyst; he cannot rely on the source of an inquiry to provide the factors that 
Bight affect rates, but must imagine, or determine in static analysis what is 
important and ensure its representation in the flow, diagram. Morover, he must 
ensure that he has appropriate information about how factors affect rates. The ■ 
curve fitting of trends data or introduction of tabulated functions may supply 
some of the necessary data expressive of such relations. More likely, though, 
the analyst win have to write a tentative, gross rate equation and then write 
auxiliary modifying equations that elaborate the major terms in the rate equation, 
^en these auxiliary functions are evaluated and substituted back into the rate 
equations by DYNAMO, the rate equations are then fully specified and solved. 

The rate equations are what give the simulation its dynamic aspect and all 
rate equations must have the basic common tisae unit of interest in the denominator. 
The general form of a rate equation is; 

R Rate.KL - a function of /"Ipvolq v onA ^^^^*.^..*.^\ t^^ 



erJc 



67 



The levels and constants may be Information about levels and constants from other 
subsystems having different observation and flow units, or there may be many of them 
impinging on a given rate, so that auxiliary modifying equations must be used. Modi- 
fying equations may take any form, provided that, when substituted back into the rate 
equation, they jointly preserve the unit oimensions as well as algebraic form. This 
principle not only checks the consistency of the equations, but also guides formula- 
tion of a complete and dimensional ly consistent set of auxiliary equations. 

The constants appearing in rate and auxiliary equations are specified 
as constant equations of the form C~k, where k is some value supplied from static 
space observation, analysis, or computation in static space. For example, they 
tnay be. regression weights, partitioned variances, or ratios between retrieved 
aggregates. They may also be conversion constants, either in the sense of converting 
units of the same kind (a metric conversion) or in the sense of converting flow units 
in one subsystem to those in another (e.g», $/person). 

Constants may also represent delay or adjustment times required for some 
information to feedback to affect a rate. It Is conceivable that such times may 
be variable, rather than constant, and depend on system dynamics. Where this is 
the case, the delay time would be formulated as a level variable connected by 
rates to other levels affecting it, with the appropriate information loops indi- 
cated in the flow diagram. The delay functions provided in DYNAMO should be 
useful for the more common exponential delays, DLINFl and DLINF3 for information 
delays, and DEIAY3 for conservative flows of personnel and resources. 

Goals and constraints may appear in rate equations, either as constants 
or variables, but are mor.2 likely to appear in auxiliary modifying equations! If 
variable, they would have to be treated in a manner similar to that for variable 
delays, as indicated above. Generally, the difference between a goal, or con- 
straint, and the actual level of a variable would appear as a factor in a rate 
or modifying equation. To maintain dimensional consistency, it may be necessary 
gpj^- express, such a difference as a ratio of the difference to the goal (or constraint). 



68 

This would be the case where such a function were to be multiplied in the rate 
equation by a factor already having the proper dimensions for the rate function; 
the factor for the difference between the actual and goal levei should then be 
dimenslonless to preserve those dimensions* 

Although somewhat speculative at this point, it may be instructive to 
envision some potential uses of DYNAMO functions described in Chapter 8 of 
Forrester's Principles of Sys terns > The computational functions: SQRT, EXP, and 
LOGN, might be required if some equation involving these functions were fitted 
♦to data in static space; for noneconomic data, at least, this does not seem to 
be likely. The interpolation functions might be needed to obtain a constant for 
an-, auxiliary equation from a table of constants dependent on the value of some 
level, and the value sought is not tabulated. 

The STEP function might be quite useful if some subsystem, not now connected, 
were anticipated to come into play at some specified future date. E. g., one wants 
to show a legislator what happens If requested funds become available next year In- 
stead of five years from now, the result may be different and the effect lag may 
not differ by 4 years. It might also be useful In a situation like that described 
on page 60, where the manager decides to buy more engines until present floor space 
constrains him; he may decide that the product quality Is stl I I not good enough and 
then decide to manipulate both floor space and engines. 

The EAMP fur^ction might come in where the legislature decided to start 
funding a new program at a certain level and indicated that it would probably 
increase the funding steadily for several years* It is not clear how the tri- 
gonometric functions might be useful. 

The noise generators should be useful in studying the effects of random 
variations in system parameters on a simulation model. Where mean values are 
used to initiate levels or as constants which enter rate equations, their standard 
deviations, standard errors of measurement, or sampling errors, with the mean and 



69 

>IORMRN function modifying rates could be used to introduce these variations into 
dynamic simulation. Such variations xoay be considered as part of the ^'reality" 
being simulated, or may be used to estimate error effects on simulation runs. 

The logical functions should prove quite useful, especially where rates 
Atll be functions of comparative magnitudes not otherwise expressed as ratios or 
differences. For example, the difference between a level "and some goal may be 
a term in a rate equation; when the level surpasses the goal, it may be desirable 
ciot to let the negative difference affect the rate, but for the difference term 
to be evaluated as zero. This can be accomplished by MAX (or MIN, reversing P 
and Q parameters), by making the difference term equal to Q: MAX (0, G-L). CLIP 
i/ill be more frequently useful, especially for allocation and resource limitation 
controls on rates. For example, the rate at which engines can be added to the auto- 
Tiechanic laboratories depend on dollars available for engines, but only as long 
as the amount available exceeds the cost of a single engine: R=f (CLIP DAV.K, 0, 
QAV.K, DPE), where DAV.K is dollars available for engines, and DPE is a constant 
20st per engine. SWITCH is a specialized form of CLIP. 
Inferential Errors in Dynamic Simulation 

This section considers various sources of error in dynamic simulation. 
Presumably, errors in initiating values may result in a distorted picture of the 
system after a period of simulated change. Unless the operations research literature 
5r studies by persons working extensively with dynamic simulation exist, showing 
Che effects of error, MISOE should conduct sensitivity studies to clarify this 
natter. It is plausable to expect the effects of error to be most severe at the 
>eginning of a simulation run, and gradually diminish as the length of the run 
Increases. Not only may this not be the case with some models, but we need to 
enow how long it takes for error effects to become negligible, when it is the 
sase. The answers to these questions may well be different for different kinds 
O 5f models, in terms of their complexity, data types, order, number of information 

EHJC 

^^msam Loops, delays, and actual rates. Presumably, the presence of negative feedback 



70 



Loops would be favorable to fast dampening of error effects, while positive loops 
aay exacerbate them. In any case. In view of the strong commitment to use dynamic 
simulation in MISOE, it is strongly recommended that the staff search the tech- 

lical literature and perform .whatever necessary experiments are required to clarify 
these issues. 

It is also necessary to develop a strong sensitivity to the prevention 
md recognition of modeling errors . For example, failure to consider all of the 
factors that might affect a rate or the use of an improper function will, in 
affect,model something other than one's hypothesis about reality. The diagnostic 
nessages in DYNAMO, like those of most good compilers, will catch most errors 
:hat are violations of the DYNAMO language, whether these resulted from erroneous 
eormulation of equations or keypunch errors. They will also indicate certain 
inconsistencies, lack of definition, and mathematical impossibilities. They 
/ill not, of course, identify errors of conceptualization, or tell you when a 
feedback loop should have been included to have a variable effect on a rate, 
.nstead of merely supplying a constant. 

In typical examples of dynamic simulation the variables are expressed 
.n clearly defined and well understood metrics (e.g., numbers of people or 
loUars, physical units of length, weight or time, electrical units of voltage, 
.r capacity, or ratios of such units). Such units are readily convertable to 
»ther units of the same kind (feet to miles, months to years, pounds to tons, etc.) 
.y linear conversion constants. It is not clear, however, wheLher the arbitrary 
letrics of psychological test scales and their various transformations, both 
-inear and nonlinear can be used in the same way in dynamic similatlon. In any 
:a8e, some kind of metric consistency with respect to raw scores vs. standard 
scores and the proper choice of regression weights (b or beta) must be insured 
(hen such data are used in dynamic simulation. 

It is likely that the use of standard scores which can go negative (Z- 



ERjC some trouble in the behavior of a system equation. This may be 



71 

avoided by the use of T-scores formed by adding a constant to the Z-scores with 
or without their normalization (McCall's T). Where standard scores are used in- 
volving more than one group, the scores should be on a conxDon normative base* 
One can convert simulation outcomes, if necessary, to wlthin-group defined metrics 
for communication of results to different program managers in case they are using 
program norms rather than statewide norms. The nature and extent of some of these 
problems requires further discussion, and some can be headed off by appropriate 
management of testing, and of the reporting of test results. Note that many 
commercial tests are normed on arbitrary customer samples, and may or may not 
be relevant metrics for MISOE purposes. 

Initiating values of levels and constants, many of which come from 
sample data and analysis in static space, are subject to sampling errors and 
errors of measurement. A fuller discussion of sampling errors and their effect 
on computered statistics will be included in Occasional Paper No. 12. It is ap- 
parent that the use of a mean value for a level or a constant Is associated with 
a standard deviation, so that there is some noise in these values when they are 
used in rate equations. Consideration should be given to the comparison of two 
simulation models, one of which ignores such variation, and the other of which 
takes It into account by means of the NORMRN function in DYNAMO, Similar con- 
siderations apply to the effects of measurement error in levels and constants. 
The correction for attenuation in correlational analysis in static space corrects 
certain relationships among variables, but not their observed values. 

The total variance about an observed value may consist of "true" 
variation of some kind, sampling error, or measurement error. Attempts to 
minimize measurement error by setting high reliability requirements on observed 
variables, and to minimize sampling error by means to be considered in Occasional 
Paper No. 12, will reduce the seriousness of . error effects in dynamic simulation. 
Nevertheless, we cannot be sure that they will be reduced below some unknown 



72 



tolerance level In dynamic simulation. This Is vhy we need more information 
about what that tolerance level might be for different kinds of simulation 
models. 

Nonrandom bias in levels and constants, when unknown, may have more 
serious effects in dynamic simulation than the more nearly random variations 
discussed above. The control of bias in sampling will be discussed in" Occasional 
Paper No. 12, through consideration of sampling logistics and weighting procedures. 
The present concern is with those simulation parameters (e.g., regression weights) 
resulting from correlational analysis, especially as a result of incomplete pre- 
diction. It may be that regression analysis will be much more useful for identi- 
fying important relationships than for supplying regression parameters to simula- 
tion. Actual values of dependent and independent variables will be available 
and normally should be used in simulation in preference to predicted values. One 
can use regression weights or partitioned variances where needed with greater 
confidence when the multiple correlation is high. Again, it may be worthwhile 
to conduct an experiment with a simulation model using dat^ involving a full 
model regression where R is high, and to compare the -results with that of a 
similar model, ignoring some of the regression variables and using the weights 
recomputed on the corresponding reduced regression model.* 

Because MISOE is pioneering the use of dynamic simulation with a mixture 
of physical, economicand psychological data, it will probably not find the answers 
to these questions in the available literature. It must, therefore, be prepared 
to be a pioneer in facing up to some of the methodological issues which are pre- 
sumably new in MISOE design and operation. It is not intended that these issues 
sidetract the main operational thrust envisioned for MISOE, but only to insure 
a high level of accuracy in the results of analysis fed back to management, and 
on which their policy decisions may be based. 



ERIC 



The weights for the dropped variables are zero. 



73 



A Pseudodynamlc Model as Nonlinear Programmlnj^ 

This section responds to a particular staff request. The request con- 
sisted of ascertaining whether the following problem could be formulated and 
solved in dynamic simulation: 

The manager within a certain program (e^g*, automechanics) 
wants to choose the least cost process mix which will trans- 
form a certain Input mix into a certain product mix. 
The staff particularly wanted an example at this management level (input-process- 
product). The following assumptions and attitudes were imposed on the attempted 
solution: 

1. the example would be designed to raise issues for further discussion 
about the feasibility and techniques for handling such an inquiry via dynamic 
simulation methods; 

2. the possibility of a static space or linear programming solution to 
such an inquiry would be ignored for the present; 

3. the example would be set up as a comparison of two process mixes, 
submixes, or elements (whether human, physical, or organizational), in such a way 
that the model could be easily generalized to the comparison of more processes, 
or to additional input and output mixes; 

A. the fundamental flow would be from an input level (IL) of the num- 
ber of students with a certain input mix to the product level (PRODL) of the num- 
ber of students with a specified product mix, with process and cost information 
moderating the rate equation corresponding to the flow of students through the 
process; 

5. there would be a flow channel for each process and the rate equations 
would be formulated in such a way that there would be a null rate for all but the 
least cost channel; 

6. the example would be kept as siwple as possible without feedback 



ERIC 



74 

loops and explicit economic or other information subsystems; all information, how- 
ever, must be available in storage or from static space analysis. 

The "flow diagram" so formulated appears in Figure 1. The following 
rates are Ignored: . 

\+l ^'^^ source of input students, 

those students with the "certain input mix" who 

go to some other output mix (OPML) by an^. process, 
^nf3 ^^^^ other output mixes going to the societal 

intact sink, 

\+4 ^^^^ specified product mix going to the soci- 

etal impact sink, where n - the number of processes compared. 

Note that the processes, themselves, do not appear explicitly in this 
diagram, except as labels on the flow routes, although Information about process- 
product relations does. Note, also, that the two "processes" are not specified, 
but could be the use of laboratory vs. putting students through a cooperative work- 
study plan; or having older, more experienced, and higher Salaried instructors vs. 
having younger, possibly more flexible, and lower paid instructors. 

The "rate" associated with each process consists in part of an auxiliary 
function of two factors: the probability that some portion of IL will move to 
PRDDL in - ocess time and that it cost so much per student to do so. No regression 
informatiou is required; only the student mix by product mix I/O table and the 
corresponding cost per student table. This auxiliary function expresses the basic 
flow rate in terms of the probable benefit and its cost, as an Inverse cost-benefit 
ratio, all over process time. Note that thla permits the two (or more) processes 
to require different lengths of time. The rate equation for each process Is con- 
ceived as consisting of the auxiliary function, or basic rate, modified (i.e., 
multiplied) by a CLIP function, or as many such CLIP functions as there are other 
processes in the model. The purpose of the CLIP function, here, is to leave the 



ERIC 



4* 



75 




■ OPML 
No. with other 
product mixes 




No. with certain 
Input mix 




Rn+4 




PRQDL 

No. with specified 
product mix 



$/Stud. A 



CLIP 
B 



\ 



( AUX \ 
CRIT B "B" 



■triProb ax $/Stud. B 



Process B 



ERIC 



Figure I. Pseudodynamic Model for Process-Product Inquiry 



76 

rate as computed by the auxiliary equation, aultiplied by 1, if that process is 
already the one with the "best" cost-benefit ratio (it is only least-cost if all 
channels produce the same number of students with the given output mix for a given 
Input. number) . Otherwise, CLIP changes the auxiliary function to a rate of zero. 
It Is assumed that DYNAMO can solve the equations so formulated with a DT about 1/3 
the length of the shortest process and can be made to print out the values of the 
rate functions at 3-6DT. The answer is to take the process with a "non-zero" rate. 

The auxiliary and rate equations are: / v 

AUX;^ ■= Prob (PROD/IL)*x StudsA/$ = ( ' L-PRODL ) , ^"''"^^A 



Process;^ Time ' . Processy^ Time 

AUXg = Prob (PROD/IL*x Studsg/j f ' L-PRODL^ . ^"'""^^B 



IT 



ProcessB Time Processe Time 

R| = AUXa X CLIP A 
R2 = AUXg ,x CLIP a 

Whatever else can be said about this example, it appears -to have answered 
the manager's inquiry, as stated, but the formulation, regardless of flow diagram 
symbols, and equations of the "proper" form, is not dynamic simulation in the 
Forrester sense. It appears that dynamic simulation modeling concepts have been 
used to perform a kind of brute force nonlinear programing. 



* — 

Probability of a student obtaining the specified product mix given that he 
had the certain input mix. 



77 

It should be noted that this formulation does not define the least cost 
process ml:c, but only chooses the least expensive one, among those defined; pre- 
sumably this could be done by direct comparison of total product costs. The 
particular class of inquiries involving proccss-pr^. iuct analyses, with rela- 
tively fixed program lengths, and constrained by fixed input and output would not 
seem to require dynamic simulation, unless embedded in the larger system implied 
by other .student and product mixes, with the larger system including data over a 
longer time (e.g., impact data), and feedback loops. Moreover, there is a kind of 
cohort batch effect in the flow of students through a program, which can probably 
be ignored when program length is small compared to the process time for con- 
tinous flows in a' larger simulation model. 

From the viewpoint of dynamic simulation, the present model is a de- 
generate one and the forcing of the rate values to be either positive or zero a 
ginmick. The manager's inquiry is perfectly reasonable, but should be soluble in 
static space by other procedures or models. The next two sections discuss two 
such possibilities. 

A More Rational Approach 

Another approach to the problem ciiscussed in the last section is to 
formulate it in terms of vector diff erention, following the method described by 
Van de Geer ( Introduction to Multivariate Analysis for the Social Sciences . W.H. 
Freeman and Company: San Francisco, 1971, p. 58-59). 

1. Perform the regression analysis of "the certain product mix" on 
the process variables, using the students with certain input mixes 
as the unit of analysis. 
This gives the equation; 

Predicted product level (PRODL) - b'X+C, which is converted 
to the form, g - b'X+C - PRODL - b'X+K. X is the process mix 
<5 column vector sought; b is the column vector of regression 

EKLC 



-78- 

weights, C the regression constant.. (K « C-PRODL), 

2. Assume that for each process variable selected by the regres-^ 
slon analysis the unit cost data are available (e.g., ?/sq. ft., 

$/ teacher contact time with student, etc.). The total product cost 
is given ap a vector, Y, of the actual process variable costs; 
Y « VX, where V is a diagonal matrix of unit costs and X 
is the column vector of process variables, as above. (The 
grand total product cost is lY, where 1 is the unit row 
vector). 

3. The cost function in vector form is linear. Hovcver, in order to 
apply the suggested procedure, we need it in bilinear form for 
tjinlmi sing -under the constraint of relating process to product, as 
es^pressed in the regression solution. To obtain the biline&r form 
required, define the scalar Y* « Y'Y « x'v'VX. Hinimizing Y* 
minimizes the sum of the squares or the actual costs of the process 
variables. Letting V* « V'V, Y* « X*V*X, the bilinear form required 
for solution. 

A. Set the function g (in step one above) to zero. 

5. Define the auxiliary function: 

F - Y* " Ug - X*V*X ^ p (b*X+K), where p ia a Lagranp;ian 
multiplier. Take partial derivatives of F with respect to the 
x^, evaluate the Lagrangian multiplier, and solve for those 
values of process variables^ that minimize the sum of the 
squares o^ the costs of those process variables most pre- 
j dictive of the stated product mix. It will be shown below 

that it is a minlmtnn. 

The above - steps require a little further discussiono In step 1, the 
Q proper treatment of* the control of the regression for "the certain input mix" is 

ERIC 



-79- 

not entirely clear. If the input mix is a constant vector, input is already con- 
trolled; if the input mix is a class of input vectors, a possible treatment is to 
resldualize the PRODL agaiaatit, and then predict the residualized PRODL from 
process variables, in step 2 there is the assumption that cost data are avail- 
able for each process variable selected and that it is in, or transformable into, 
the proper form. The substitution, in step 3, of the sum of squares of actual 
process costs for the total product cost as the function to be minimized seems 
plausible, if somewhat forced. If a linear function is used, the desired 
variables differentiate out and a trivirl solution results. Using the squares 
of process variable costs will tend to depress the x^ values for the most ex- 
pensive process variables, which has some Intuitive appeal. 

The setting of the g function to zero In step 4 amounts to using the 
actual, rather than the predicted value of PRODL. If the residualizing against 
Input is used, it amounts to using the actual residualized PRODL rather than the 
predicted residual. The validity of doing this depends on the actual value of R 
In the regression solution. 

.hn,. +h.+'Ih'"^'"^i"!,'^'' fP^""^ solution in step 5 more explicitly, and to 
show- that the solution is a minimum: k y, anu 

F/tfX = 2V*X - Ub = 0 

X = />V* 'b/2, but ^ is as yet unI<nown. Since b'X = -K when 3 = 0, 
b'X is I<nown, and can be set =//b'V~'b/2. Solving for 
P,p = 2 (b'V*-'b). Substituting/; bacl< In the equation 
forX, X = V*-lb (b'V*-')-l, the column vector of\alues of 
the process variables that minimize Y*. 
To show that the solution is a minimum, evaluate the second derivative of 
the F function; it is 2V*. Since the squares of process variable costs are posi- 
tive, the second derivative is positive, and therefore, the solution Is a minimum. 



erJc 



-80- 

The feasibility and generality of this solution for MISOE should be dis- 
cussed further and compared with other possible alternatives. It appears to be a 
form of nonlinear progrannsing. 

A Linear Programming Solution 

The approach in the previous section started with the recognition that 
the vector differentiation of a bilinear function could be useful in such a prob- 
lem, but in step 3, the cost function was redefined to ensure a match to the 
model. The essentially linear nature of the problem, both in the regression func- 
tion and in the original cost function, suggests the possible use of linear pro- 
gramming is a solution. In order to formulAte a problem by linear programing, 
it is necessary taat the basic concepts and assumptions In such an approach are 
met. (Dantzlg, George B., Linear Progranmlng and Extensions . Princeton, N.J.: 
Princeton University Pr«ss, 1963.) With little imagination, the process vari- 
ables may be thought of as "black box" activities , and the students as items flow- 
ing from input to product spaces. Imagination becomes somewhat strained, -however, 
with respect to the concept that activity levels (i.e., values of process vari- 
ables) are changed by flows intc and out of the "activities," and by the "pro- 
portionality" assumption that a doubled flow (of students) doubles the activity 
levels. There is also some strain with respect to the additivity assumption, as 
usuaUy interpreted in linear programming problems. Nevertheless, the assumption 
that process levels are nonnegative is readily met (with linear transformations 
on the variables, if necessary), and the expression of conservation of a precious 
item (money, in this case) in a linear objective function is applicable. More- 
over, it is possible to formulate a set of equations in our problem, which have 
the same mathematical form as those In a typical linear progranmlng problem. 
These equations are: 

^' " SX, where Y^^^ is a column vector of PRODL, residualized 



ERIC 



-81- 



against input; B is a diagonal matrix of regression weight? ; and, 
X is the co?.U3nn v-ctor of process variable levels. The B matrix 
may be replaced with a matrix, C, of variance contributions, 
which may not be a diagonal matrix. In either case this subsystem 
of equations corresponds in form to the material balance equa- 
tions of linear programming. 

2. z(mln) « vX, where z is the product cost to be minimized, v is the 
row vector of unit process costs, and X is the column vector of 
process levels, as above. This is the objective function. 

3. x^>0 expresses the nonnegativity restriction, which must apply to 
both the regression and current models in this formulation. 

An important question is whether, in a given application, this system of equa-^ 
tions can, be solved with the simplex algorithm. Moreover, the concerns expressed 
throughout this paper about model validity and inferential error are also appli- 
cable here. 

With this, we now have three formulations of the original inquiry: 

1. pseudodynamic, which told which of two process mixes already 
defined was less expensive, but which did not define the mix in 
terms of levels; 

2. nonlinear (i.e., bilinear), which yielded a process mix minimizing 
the sum of squared costs rather than the total product cost, and 

r 

3. the present linear approach, which defines a process mix mini- 
mizing the total product cost, as asked, if the simplex solution 
exists. 

For the kind of problem raised here, it is recommended that the linear 
approach be tried first, and if It fails, that the nonlinear approach be tried. 
It is probably best to write off the pseudodynamic approach as a learning ex- 
perience. Again the feasibility and generality of the linear approach for MISOE 



problems should be discussed further. It is likely that minor variations ±n the 
way the original inquiry is formulated, while still focused on the goal of finding 
the least cost process mix, will render a better match with linear programming 
concepts and assumptions. By analogy with the transportation type of problem, 
linear programming would appear to be feasible for a class of inquiries charac- 
terized by different student Input mixes going through different process channels 
to different product mixes. The different student mixes are the items, the dif- 
ferent and unknown nuicbers of students of each type going through each process 
are the activities. Here, the processes, per se, are "black boxes," available 
inputs '^tid output quotas are constraints, and the objective function is to mini- 
mize the overall system cost. The relevant process data and associated process 
costs presumably provide the coefficients in the system of equations. That is 
why the personnel assignment problem in the military, referred to In an earlier 
section, was amenable to the linear programming approach. 



-•83- 

Part Six. Epilogue 

This short, final part consists of a few general, suannarizing statements, 
or "conclusions" as follows; 

1. MISOE needs a highly varied repertoire of general models and algo- 
rithms. 

2. Regression is a powerful tool. It may solve some problems directly, 
or with little further effort in static space. It may be used to generate expec- 
tancy tables of the type described by the author for counseling and/or admissions 
problems (Creager, J. A. "Use of Research Results in Matching Students and Colleges," 
The Journal of College Student Personnel , Sept. 1968). Regression is most likely 

to be useful in identifying the important variables for other analyses and to give 
some informatio:i about the "relative importance" of those' selected. Regression 
parameters may also be useful as simulation parameters, but this appears to be less 
likely than originally thought. 

3. MISOE needs to maintain flexibility of options until more is known 
about the relative frequency of inquiry types from various levels of management. 

4. There may be some limitations on the utzility of dynamic simulation, 
because the conditions for its use are not yet completely specified, and its- sensi- 
tivities to various kinds of error are not yet adequately documented. This whole 
area needs further study. 

5* MISOE needs to engage in some "shakedown" experiences before becoming 
fully operational, perhaps including a pilot, partial implementation period. Both 
ethical and pragmatic considerations require great attention in MISOE development 
and implementation to the sources and control of Inferential errors in the applica- 
tion of all analysis models. 

6. MISOE needs to assess m3te clearly the utility of linear programming 
Q and other models for integrating the economic and noaeconomlc aspects of analysis. 

ERIC 



-84- 



7. The current status of MISOE represents a vision of great potential 
use as a system In support of management. Many problems have been faced and worked 
out, either in whole or in part. Much .remains yet to be resolved before initial 
Implementation; some tiatters will be resolved in the context of operating experience. 



APPENDIX A 



TECHNICAL REPORTS FOR MISOE 



ERIC 



'.102 headquart:?rs 

656(?rh research and -development group 
(pep^onnel research laboratory) 
human resources ressi\rch center 
lackla^:d asi force base 

San Antonio, Texas 

STAFF RESEARCH MEiORANDUM 2 September 1953 

Project: 503-001-0016 

STUDIES IN METHODOLOGY 
II.- EFFICACY OF THE UNIVARIATE FORJiULAS FOR CORRECTING FOR RESTRICTION OF RANGE 

John A. Creager 

One of the frequently encountered problems in the troatment and interpreta«. 
tion of psychological data is that of correcting a correlation coefficient for 
restriction of r^e. This memorandum is concerned with some ?haracterxstxcs of 
the correction fonnulas as they are appUed to J^^J^.f^^^ 
variables are continuous and nomally distributed in the unselect^d populat^n. 
F^ the case of univariate selection, three basic formulas are avaUable im 
Research Report ^, "Research Problems and Techniques", pp. 63 - 68, Cf . . 
ThorndikeTRrL., Personnel S electio n, pp. l69 - 176): 



CorrecUon Formula I: ^}p. " I 1 - d - ^13) 




«her« R,o is the corrected correlaUon coefficient, r^. the available correla- 
tion iS^ sample restricted on variable I, s is the sl^^ deviation of ^he 
indirectly resti-icted variable 2 in the restricted group, and fa i= J^^*^*"*^ 
Sv^Itioif of the indirectly restricted variable 2 in the unrestricted group. 



Correction Formula II: R12 " 




where the gymbols have the same meaning as in Correction F^^^^^I, except that 
the st^^IdaTdeviations are available for the directly restricted variable 1. 



^•12 '* ^13 ^23 



Correction Formula III: H32 



ions ™reUMna.-y report, of "t-^"- J^Sl^J^^^j^^SSS^r rtJaSicf in 
withdrawn at any time and hence axe not suitable for incxusion or 
Srf^«.r?eports of a acienUfio or technical character. 



11 



where direct restriction occurred on variable 3, and both vailabileo 1 and 2 had 
been indirectly restricted by their cori-elations *dth the direcUy reetricted 

For the case of eimultaneous correction of man^- coefficients, as in a cor»- 
relation matrix, with either univariate or multivariate selection, matrix foraul- 
ations for correction of range restriction ai'e available, and wiU be discussed 
In a subsequent memorandum* 

The basic assumptions underlying these various foanulaUone for correction 
of range restriction are: 

. a. The regressions of indirectl^r aalected variables upon directly selected 
variables are linear and homoscodastic in the unrestricted population. 

b. The slopes of the regression lines are unaltered by seiectieo^ 

c. -Ihe standard error of estimate of indirectly selected vailables from 
directly selected variables is unaJ,tered by aelectiona 

d. The partial correlation between indirectly selected variables is un- 
altered by selection. 

If the first two assumptions are met, the last two will generally be met also. In 
GulUksen's development of these fomulas, no aq>Ucit assuaption U made regartl- 
zng nomaUty of the distributions of the two variables, fixtraae deviations fron 
nonnality vxll, however, make it quite difficult to meet the stated assumptionc. 

In practice, the application of these fomulas" assumes that direct selection 
has occurred entirely, on known and measured variables. Thus technical school 
criterion data nay have been subjected to sources of selection other than the ex- 
plicit requireiaeats for career guidance and assignment. In such a case the cor- 
JX'Ction would &«ierally err on the side of conservatism, i.e. the fomulas would . 
uncierestimate correlation for the unselected population. 

'llie restriction imposed by the meeting of priorities in fulfilling quotas may 
also effect the applicability of these foimulas. For example, 1000 men may have 
tjeen assigned to two technical schools, A and £, on the basis of the same stanine 
cut-off. Suppose then that 600 men qualify with a stanine of five or greater. 
&»ppo88 further that ochool A requires 250 men with top prLorlty and School B must 
c^/e what is left. Ihe assignment of the 600 men from two schools, in tennu of 
tweir stanine scores, would look as follows: 



3taplne 


Jl 




9 


1*0 


0 


8 


70 


0 


7 


120 


0 


6 


20 


150 




_ ,0 


200 


r 


250 


350 



350 (N • 600) 

WhUe this is a rather extraae example, it is obvious that the assumptions listed 
above will have been violated. 



ERIC 



Hi 



The ptesent study vaa undertaken to test the validity of the univariate cor- 
rection rormu}.aa under conditiona meetinti the aasumptions, and to ascertain the 
effect, j.f any, of linear dependence between directly and indirectly restricted 
variables « 

These studies wore carried out using an ejtperimontal population, previously 
prepared using punched card procedures, an4 described in the first memorandum of 
this series (Studios in Methodology - I« ijpscription of an E5q)erimental Domain 
for Methodological Studies). The theoretidal correlation and distribution sta- 
tistics for the ^inrestricted population are given in Table 1. A random sample of 
500 cases was obtained and two restricted Samples prepared as follows: 

J^^^^r^Q^q^ Saqple I,. The 300 cases having the Iiighest score on the composite 
test, #11, were selected from the randaa sample of 500 caoaso Tnis provided 
a sample of 300 cases directly restricted only on a composite score, thus 
simulating the conditions resulting froa selection on an aptitude index 
stanir* of five or greater to obtain a pool of ^'cjualified" men. Ihe fulfill- 
ment of quotas may yield different results as previously discussed* 

^?-?^j?-Q.^^j^ri^^^^ The 180 cases having the highest scor^j oir test ffl were 
selected from the random sample of 500 cases c This sample simulates a some-- 
what more severe restriction upon a single non-<:ocapo8ite test. 

The intercorrelations and distribution statistics for these two restricted samples 
are given in Tables 2 and 3, respectively. 

With these restricted sampL 5 and the "true" population values available^ 
certain questions concerning, ran^e restriction corrections can be answered. The 
^irst question was that of comparing Connection Fomulas 1 and II for their effi- 
cacy in correcting a correlation between a directly restricted variable and an in- 
directly restricted variable. The difference between the formvdas is dependent 
upon the available infomation, i.e. whether standaxxl aeviations are knovai for the 
indirectly restricted variable CFonnula I; or for the directly restricted variable 
iPonnuia II j. In many practic<il situations both standard deviations are known and, 
xf both fomulas are a^^plied to correctintj the same restricted coefficient, appre- 
ciable discrepancies may i>e occasionally noted in the corrected coefficients. 
Table 4 shows- the errors (corrected coefficients minus "true" coefficients; ob- 
tained by applying Correction Fonaulaa I and II to correlations involving the 
directly selected variable* iiomewhat lar^^er errors are encountered xiith Formula I 
ihoii with Formula II, The greater errors for restricted Sample II are due to the 
sin^oller size of the sample, i.e. due to sampling errors in the restricted correla* 
tions. It is also apparent that Formulu I is somewl'iat more sonoitive to such 
errorso The highest errors occurred vhere the restricted correlations were nego-* 
tive. The practical conclusion is that, where standard deviations are knovjn for 
the directly restricted variable. Correction Foimula il should be used in prefer- 
ence to Formula !• 

Attention is called to the fact that the corrections in restricted Sample I 
are valid for tests wliich are weighted into the restricted ccmp:>sites. Hence, 
Correction Formulas I and II do not appear to be invalidated b^ linear dependence 
between directly and indirectly reatrJ.cted variables^ 



ERLC 



Iv 



Another questions considered involves the efficacy of Formula III for 
correcting Intercorrelatlons among Indirectly selected variables. This was 
tested only for restricted Sample I, Each correlation was corrected indi- 
vidua My by Formula III and the resulting coefficients compared with the 
"true" values from Table I to yield the error matrix given In Table 5 This 
matrix has been augmented by a row vector consisting of the errors resultlna 
from applying Correction Formula II to the correlations invoivinq the dl- 
^hLLH ^^^^ "^hese errors are small enough to be at- 

tributed to samphng errors in the original restricted sample (treated as a 
In! ^hJ? of 300 cases from a restricted population). It is also appar- 
ent that Correction Formula Ml Is valid for those particular tests which are 
components of the explicitly restricted composite. • 

It should be emphasized that the present study dealt exclusively with 
univariate selection, involving Pearson coefficients of correlations, where 

oriKV^\-°""^'"''°"' normally distributed, and where practically 
al of the restriction occurred only on the single directly restricted vari- 



ERIC 



V 




ERIC 



vi 



J 




1.0 



I.I 



m 

m 



m m 

2.2 



1.8 



L25 11.4 III 



ERIC 




i s ^ ^ 

>q <o 



to <X) CO CO 



9 



NO 



cv O 

?1 « 



UJ 
—J 

GQ 



B 

Q) 

I 

CO 

01 

03 

c 

-P 

01 

CO 

g 

I 

CO 
CO 

§ 



o 



si 



-i 



o ^ 



H 



CM 00 



C*> CO 



H as 



r-l 



CO 



o 

UN 



50 



O 



O 



NO 

O 



>o 



-4- 



o 



3 



>o 



CM 



CO 



CO 



CN 
CO 



vO 

o 



NO 



O vD 

cr\ -3- 



M CM 



SI 



vO 



O 



CO 



>o 



CM 



CO 



CO 
CO 

a> 



O 
(0 



■P 

CO 



-d +> -p 

<«> K .1-1 

4^ a.* CO 
o 6 



vin 



o 

§• 

o 
o 

Q) 
O 

o 

CO 

-J 

;^ 

«, 

A> 4^ 

o m fn 

0) Q) Q) 



s 

0) 

> 

CO 



CO 



S S{ H '-^ o H 



o 



o 



o 



00 



H 



o 

^4 



SI 



01 



* 



Ix 



Si 



Hi 

-J 



M 
M 

•3 
g 

(2 

<^ 
o 

n" 

^ <D 

B 

(0 

t 

o 
u 

M 

(O 
(P 



s 



I 



-I 



*A CD 



o o 



R 5 
? 9 ? «? ^ 



o o 



g K? ;0 

O 



CO 



O 



CM 



O 



o 



to 

? 



^ ?i ^ 

O Q 3 



5X 

5r> no o 

? ^ 9 



^ 

J< v\ o 

O o 



3 3 



^0 to 



O " 



O 



CO 



to 

I 

-p 
•5! 

r-J 

I 

o 



■P G) 
C O 
0) ^ 

c> o 
(d o 



O 
D 



CO 

o 



c*^ 



CO 



0) 

•P Jii 
o o 

a m 
c5 « 



0) 
21 



X 



3-103 



65G0'"n R1::SEARCH AMD DEVELOPMENT GROUP 
{PF-RSOniTEL PKrETVRCn L?.nORATORY) 

mmi] RTSOURCK.^ lajnrARcir ctnter 

LACr.LA?5n AIP FORCn nASF 
San Antonio.. Texas 



cTAFF Rt::cjj?arcH HE??OPJVnDUH* 
Project: 503-001-0016 



2 SenterohGr 1953 



STUDIES IN 



.Mr.TnODOLOGY 



III. A IJOTF. O'l TV'F. IIATRIC FOP/^ULATIOIiS FOP CORRI^CTIIJG FOR PAIJGE RESTRICTION 



The present inquiry is concerned with two major problems • The first 
involves a clarification and interpretation of the matric formulations . 
with special attention to the relationships among various formulas. The 
second problem involves the empirical study of the efficacy of the niatric 
formulas • 

The assumptions underlying the matric formulas are the same as those 
for the three univariate selection formulas discussed in a previous memo- 
randum (Studies in Methodoloqy - II. Efficacy of the Univariate Formulas 
for Correcting for Poctriction of Ranae) . In addition the num.ber of vari- 
ables must be identical in the restricted and unrestricted groups; indeed, 
the variables themselves must be identical in both groups for the matric 
operations to have any meaning. 

Prior to consideration of the first problem, it is necessary to clar- 
ify the notation systems used by Thorndike and Culliksen, respectively, 
and to })0 sure that subscript notation is consistent for the subsequent 
discussion. Table 1 provides useful reference for this purpose. 

*TrRRC"l3taf f"" Research M.emoranda are informal papers intended to re- 
cord opinions and preliminary reports of studies. They may be expanded, 
modifiea, or wxtliclrawu at any time and hence are not suitable for inclu- 
sion or reference in more permanent reports of a scientific or technical 



John A. Creager 




ERIC 



111 



O 3 



O 

a 
-a 

8) 
O 

5 



I 

t; 

I 

O 



W t — I 

o Q> a) 



tu 

-J 

CO 



4^ 



5- 



o 

MX 
0) 



o 



o 

a 



.0 



(3> CO 

o o 

o id 

It 



r-4 



5 
s 



04 



o 
o 



1^ 



o 
o 



o 

:3 



o 

o 

C(3 



0) 

o 
> 

Cm 
O 

03 

o 
u 

-p 
c 



03 



CQ 
(D 

S 

p2 I 

O 

Q) 

CO 4^ 
0) O 

.p CO 



(0 

o 



(DM r-i 

8^ 



4i 



O 

p 



I 



ERIC 



The problem of correcting a complete correlation matrix for the reetrictod 
group^ r(a + x) tlxe. Ihomdike notation, to that of the unr^atricted group, 
^ta + xjj is broken down into two separate probloiua, correcting rax to Rax end 
correcting r^x to R^xj thus: 



& X 



a 



a 



X 



1 fiaa 




i Rxa 





v/here it ie assinned that Raa and Ha are known. Ths two fomulos for acooraplieh- 
ixiii these correctiona are: 



restricted vKm^ ^ f +i regression weights for predicting each indirectly 
H?!r «i fr-^^'^ X from the directly restricted variables, a; and Hv is a 
d.agona l matr.x obtained from the square root of the di-agonal ih e ..atrix re- 



suiTing from the operat ions in the parentheses P. 
wr I Tten ~ ' 



'XX- 



ragona I 

Equation I I IM may also be 



R 



XX = HxIPxxHx' = D5<l/2PxxDxl/2 



The reason for naming these formulas MM and I IIM, respectively is to 
emphasize thei r relation to the Thorndike univariate correction Si as II and 

ali'twT^nH-"'!!' "^^"""'^ '"^-^ directly selSe var ab?e 

anH iwo indirectly selected variables, R^^ = R33 = ! , = H3 = ^3/(^3 and b^, = 



'a; 



I 



•■13 



f"23 



xfll 



Kence formula IIIM becaneaj 



1 


2 




3 




1 2 




3 








1 


ri2, 


1 


113 


3 


213 1-^23 


•» 1 


n.3 


3 




r23 


^2 


1 


2 


x'23 






2 


1-23 







£32/032 



P12 

P12 



1 


i ' 112 




1 










2 


rl2i 1 




2 



1 * 



1 ^ 


rl3r23 


ri3r23 


^23 



a 



L 



2 / 2, 2 



andj 



Thonidike univariate 
correction foimula III 



Tiiomdike univariate 
correction fonuula III 



Similarly, for a^single directly restricted variable and a single indirectly re- 
stricted variable^ formula IIM becomes: 



fifijc • rax Ea/aa H^^ = 



or Hiorndike Univariate Oorrection Formula II. 

Thus, it is seen that fonaulas llii and IIL. are generalizations of fonaulae 
II and III, respectively, for handling many coefficients at once. Further in- 
spection of the matrf.c foimulas reveals that, vath multivariate selection, the 
ccr relations ajnon^ directly restricted variables is taken into account. Raa 
reduces to a 1 x 1 uuitrix of miity in the univariate selection case. Hence, it 
may be expected that serial application of univariate selection formulas to cor- 
rect for multivariate selection will be fallacious since l(aa is thereby assumed 
to be an identity matrix. It vdll, indeed, be a rare case, where multiple cut- 
offs involving uncorrelated variables vdll be used, it should also be noted 



xlv 



that It is the correlations among directly restricted varlablet in the unse- 
i£Cted group that ia involved here. Hence, it ia also faUaciouTto 
these correlations sispl^ because raa I, Ihe off-diagonals my have been re- 
duced to zero bj- the selection process itself. ^ 

thp rnt!.^.^^^^^ claril-ication and interpretation of the uiatilc fomulations, 
tne foriaulasjsiven by GulLlksen in terns of covariancea were translated to 
those given by Thorndike in tonus of correlations. Starting idth equation 38 
CTheoix si ,.enta; ^ests, p. 165J and transposing both sides? •'l^^^O" 

but r Vl/2 vl/2, 

^xx « Vl/2 vl/2, and 

Substituting ^Ci3 

PrMUlUplortns both sides by V;V2 postmultiplyl/ig both sides by V-V2 giyao, 

but vl/2 n and Vj/2 v-l/2 = Hy 

Hence ^^3^ becomes : 

0) V^"xV"5^^• 
^• translating aubscripts to Thorndike notatiori^^ beccnies: 

(T) Rax-^^Hab^Hjjl 

vrtiich is identical vdth fonaula Ilia, 



XV 



If one wtarta with equation ,,4^ (.2ii«21X £t iiSaiiSi Tqptp. p. i66j{ 

and oiciilaraubBtitutions are made to convert covarianca matriceo to correlation 
matricoB, / g\ becomes: 

Pro^ anA p03t-«iultipa^jLng both ' sid©» by Vy and rearranging: 
but vj/a « vl/2 V-V2 . H-1, and: 

vrfbiich, when changed to Thprjidike notatxon, roadoi 
or fomida 

Attention moy now bo focuttd on the Oinpirical evaluation of theeio matrix 
romulation«« %0eo studies vrere carried out ueing an axperiaental population 
previously prepared, ueing punched card procedurea^ and described in the firot 
mefliorandum of this eeriea (Studiee in aethodaAogy I* Oeacriptioii of an Experi- 
mental Dooiain for Hethodological Studies; , The theoretical correlations and 
distributiw statietics for the unrestricted population are ^ven in Table 2« A 
randbai ©ample of 500 cases >ras obtained, and then ^2|fejjj. restricted by first 
selecting the 300 cases having the hi^ast scores on the cotupocite tost^ flfU^ and 
then selectinii the 180 coses, Stcmjx the 300 case saoiple, having the hi^est scores 
on teat /rl. The intercorrelationi$ aod distriboition statietios for this doub2y 
reatricted sample are shewn in Table 3, 

ERiC 



To d^rsiOTstrate the efficacy of foniiu3.as IBi and IIIM, tho intercorrelotioM 
for variaW.ee If 3, 5» (>$ 10, and 11 were used, Reatriction io on vari&blers 1 
and 11» Variables 3 and 5 are weighted into tho restricting ccsapooite teat^. //II, 
variableD 6 and 10 are not eo r/eighted, Tablo 4 shova the given aatrixi t-he 
corrected natrix, the "true*' popidatic^i jnatrbc, and tho error matrix, the latter 
being obtained by subtracting the "trie" population njatrix from the correctv^ 
matrijc^ Corrections were carried out u^ing Rx;ll « o4V^# Tivo raagiiitijdoe of the 
GVrors are attributable to the sampling; eiTorb'in the given correlation matrix 
(N • iOO; Oj.^ » " 0^075)* It is apparent that tlie linear dopeudenoe vari- 

blec 3$ 5, and 11 have not distorted the correction process. However, tests 3 
ar^ 5 are ixaplicitlj^ selected variable^. The effect of linear dependence anyDng 
e:^olj^cltly selected variables is neither known nor likely to be encountered. 

Tho efficacy of the matrljc foriaulas fo:c the special case of univariato se-* 
lection is eaciiy demonstrated* The 8arj>le of 500 canes was subjected to airglo 
restriction by taking the 300 csaes with highest scores on composite test ^11. 

The intsrccrrolations for variables 1, 3> 5j. ^ and 11 were corrected by tho 
K>atrf-X formulas snd cooipared i^rith the corrections obtained for each coefficient 
by formulas II or III. The resuJAiitg error matrices vero identical* 

Although the formidable appearance of the matrix foravdas has probabl;j^ die- 
coursged their vdder use, the frequency vdth i;hich correlation JBa trices frcn re- 
stricted groups are encountered in Air Force data would oeem to justify r?ioro . 
fVQfyxmt use of these formulae. Thoy lau^i when selection is multivariate 

aM the restriction variables are correlated* Vfh^n selection is univariate, the 
procedure is highl^j'' efficient and requires less time than correcting each coeffi- 
cient separately* It is rare tJiat selection vdll have occurred on more thnun tv.'o 
vai'iabl^s and hence ^ Thomdike's atateuaent about tho Laborious iiature of tho cosh^ 
putations "v*-hen several variab3-os are directly restricted*' (Pe^^ccnt^el Sj^egtiqn, 
p. 1?6), while true, need not diocourago thjsir use for the n^ore caffioa" oinivariata 
and bivariate selection probleuu^ 



xvU 



H H H H H H r-l 



g g s 




4-18 



PERSONNEL RESSi\RCH UUORATORY 
Alli FORCE PERSONtiEL AND TRAINING RESKMiCH CENTER 
AIR RESEARCH AND ESVaOFJ^ENT CaVAKD 
. LACKLAND AIR FOFjCE BASE 
San Antonio, Texas 

STAFF RESEARCH ^^RANCUK* 12 April 1954 

Task 77006 



Studies in Methodology 
V* The Efficaqjr of Two Variants of Thorn dike Forrnvda #7 for 
Correcting Correlaticn Coefficiente for Range Restriction 



John Ae Creagor 



The efficacy of the three basic univariate formulas for correcting 
correlation coefficients for range reetricuicn was discussed in a previous 
otaff Research Hemorandum* Thomdike correction Formula #7 (Thorn dike, ft* L» 
Personnel Selection » p» 174) is applicable for correcting the coefficient of 
correlation between two variables when direct restriction has occurred cn a 
third variable* This f omula reciulres knowledge of three correlations ob- 
tained for the restricted population: r-^g^ the coefficient being corrected; 

r^ and T2j, the correlations of each variable with the directly restricted 

variable. In certain Instances, the correlations with the directly re- 
stricted variable may be kno-.'ti only for the unrestricted group* Thomdike 
gives a variant of Formula #7 (/f8) for the situation whore one of the corre- 
lations involving the directly restricted variable is kno^ for the re- 
stricted group and the other is knovn for tte unrestricted group* It is the 
purpose of this memorandum to report a small stucfy carried out to: 

a. show how Thomdike *s Formula //6 was derived, 

b» derive a variant of Formula .j^7 where both correlatic»is involving the 
directly restricted variable are Icnom only for the unrestricted group* 

c. demonstrate the efficacy of both formulas for obtaining an estimate 
the unrestricted value, Rj^2* 

The need for these two variants of Thorn dike's Formula ,/7, while not 
common, can arise in practical situations^ Thus, Formula //8 would be used 
iji a validation study where the correlatiai between a test whose validity ip 



*AFPTRC Staff Research Memoranda are informal papers Intended to record 
opinions and preliminary reports of studies^ They'may be expanded, modified, 
or withdraw at any time and hence are not suitable for Inclusion or refer- 
ence in more permanent reports of a ecientifi* or technical character* 



XX I 



under investigation and the selection test is known cnly for the unselocted 
group, If> in addition, the unrestricted validity of the selection score 
must also be used, a second variant of Thomdike^s Formula #7 would be 
needed. 

In line vdth the previous studies in this series on range restriction 
formulas^ it will be convenient to refer to Thomdike's Fomula #7 as 
univariate correction Formula III, Thomdike's Formula #8 as univariate 
correction Fomula III A, and the secaid variant to be derived as univariate 
correction Formula III B* 

If test 1 is the test under investigation, test 2 a criterion variable, 
and test 3 the selection test, the basic univariate correction Formula III 
reads as follows: 




where is the standard deviation of the unrestricted group and that of 
the restricted group. 

The derivations of the variont formulas involve substituting expressions 
for r^^ and rg^ in formula (!)♦ These expressions are obtained by writing 

univariate correction Formula II for Ry^ and ^2^$ squaring, and solving for 

2 2" JO ^ 

^13 ^23 ^. respectively: 




xxll 



This 1« iaentlcal to Tturondike'i Formula jf^* 
Substituting (5) la (7) gives: 




which toej \» simplified tot 





















w _ 






- RX3 R23 


!L-1 
S32 



which is Foraala III B. 



Tha intorccrrelatlao matrix for the unrestricted population Is shown 
in Table 1. The intercorrelation matrix for Reotrlcted Sample I ie oho«n 
in Table 2, This matrix was corrected b7 univariate correction Formulas 
III A and III B, resulting in the intercorrelations in Table 3 and the 
errors in Table 4, The upper half of each matrix refers to those corro- 
latlaia corrected by III A; the lower half, those corrected by III B, The 
errors are of the same order of magnitude as those for the basic univariate 
correction Ponsula III, and can be attr?iuted to sampling errors in the 
original restricted sample, (n • 300j Oj. - ,058) 



1 

CO] 



i ^ ^ 



COJ iH rH r4 .1-4 H r? H 



8 



•a 




s s s 

r-l rH U^ ^2 

to to to to 



to 



to 



u 



SI 



10 

o 

-P 
tt 

4> 

05 

to 



XI 

I 

3 



O 

.5 



8 



"I 

H 
H 



>0 irv 



1 1 



o 



8 S S S S :5 

*A CV -WN 5\ o 

S S S § 8 8 

-O" >0 vO 



8 


3 


9. 








8 


CM 


UN 








lA 




O 
CM 










O 
CM 














cn 








1 


s 




o 

CM 






CM 




CM 









to 

.4) 



X 
VN 

CO 

X 
*d ® 

o 

CD U 

] <B 

8.5 



ERIC 



CM 



XXV 



to CN 



43 

u n 



H 



H 



H 



IN 

6: 



CO CO 



H 



o to 



§ I 

Q too 
H 



UN 
Si 



I 



O 



IS 



Si S5 

cf\ H w> 



O 



t 



5: f:i 

O Q 



-I 



J- 



. ^ M\ sO >C O 
^ cr\ H rr^ 

O cn CV t«N 



to to <t >o ^ to 

S J^i ^ g S 



ITS Qs O 

S ^ Sn ^ 



? ? 5 



OS 



JO <*) ^ 

P> to H ^ 
rj< <N pr\ 



WN H 



5 



'I I 



S g 8 § 

»A ^ cr\ O 



O 



UN 
H 



CM en 



.to 



ON O 



ERLC 



XXV I 



pa 



^ ^ :$ ^ ^ 



I 



M 
M 
M 

CO 



O 

Q) 
-P 
O 
0) 



o 
o 



(D 
H 

I 

to 

O 
Q> 

8 

-P 

C) 

u 



o 



tX) 



O CS2 
I 



o ^ 

. O 



sO CO 
-4- wr\ 



NO 



(TV 



•8 



CS2 
vO 
H 



C*- to nO 

cv 

H <t ^ 



O ^ cr^ 

lA H 

UTN O ^ 



02 H 



CO H 
O sO 

I 



lA 
CO 



OA 



-4- 



1^ 



ir\ o cv «N 
-4- -4- O 

Q H 



UN 



O 



Cs2 



US 



o 
I 



-4- 



O nO 
»A C*^ nO 
U> ^ H 



lA 



UN 



O UN 
O 

lA CV 



to WN 



to 

5 



to 02 O 
00 CO Cs2 
lA H VA 



CO 

5 



^ CO 



lA 



-4- 

^3 



I 



VA nO 
t> CO 
lA Csi 



lA 
H 

Csi 



lA 



1 



C*N 



0^ 
iA 



-4" VA 



^ ^ 

vO CV 
lA lA 



5 



VA 



O cv; 



t:3 



lA vO 



to 



I 



O 

I? 

x» 

(!) 
■P 
O 
Q) 
U 

o 

CO 



I 



0) 

-p 

1 

a 

Q) • 

Q) H 
> M 

O O 



erJc 



XXV I I 



Personnel Laboratory 
Wright Air Development Division 
Air Research and Development Command 
United States Air Force 
Lackland Air Force Base, Texas 



SS^^Sd^tr^"'"^ Classification Branch 

27 September 1O60 7717-87003 



OK THE USE OF A GOllPOSITE SIMUUTING COMPLEX SELECTION 

Real problems are seldom as simple, clear-cut, and neatly soluble 
?nr\?I^?^ student might expect from perusal of 'his textbooks. Consider, 
for example, a situation where selection for admission to a training nro- 
gram involves t 

«.n«, ^f?i^<^*i°" composite consisting of two aptitido teat composites, 
some demographic variables, a special ability test score, a personality 
test ocore, and a rating of past performance. ^ 

4.4 4.^j Elimination of those not meeting minimum cutoffs on one of the 
aptitude composites, one of the demographic variables, and the special 
ability test, ' ^ . 

Application of the' selection composite to the group meeting the 
multiple cutoff requirements, except that bonus points are added to the 
composite scores of certain candidates for various extraneous factors. 

U. Elimination of 10% of the selectees by administrative action, 
negatively correlated with other factors in the system, and based in Jart 
on an interview cf the candidate. 

Research under such conditions may be somewhat tenuous, even vhere 
the worker has great insight into the system and statistical sophisti- 
cation, evaluating the selection or components thereof, introducing 
controls in traiivLng studies, or correcting validities for range restric- 
tion are rather formidable, and may involve so many tenuous assumptions 
and devious practices as to cast doubt on the results. 

This memorandum proposes such a complex selection process be simulated 
by creating a multiple regression system, gsnerated on the full applicant 
group. This system uses as criterion a dichotomous variable, "1" if the 
candidate was ultimately selected, «0" if rejected. Scores on tbe selection 
variables may be used as predictors. If this multiple correlation is 
reasonably high (as it usually would be), the regression composite may be 
taken as simulating the complex selection. A high corralation indicates 
that most of the Contributing variables of the actual system (or their 
equivalent) have been taken into account. Also, by examination of the 



ERIC 



xxix 



effective weights in the simulation couposite, useful inforiruition may bo 
obtained regarding the rola of the various selection variables. 

If the multiple correlation is lo;i, either important aspects of the 
selection have not been taken into account and further investigation of 
the bases for selection are indicated; or, the selection is not beinp 
carried out in accordance with the explicitly stated rules » 

If the distribution of scores on the simulated selection composite 
is cut at the actual selection ratio, the phi coefficient between actual 
and "predicted" -selection may be regarded as an additional index of 
simulation. It should be noted that phi m-iy equal unity even when the 
multiple correlation is appreciably less than 1. If there is perfect 
(or near perfect) accounting ol actual selection as measured by the phi 
coefficient, the simulation composite may be considered as a sirru'ler 
selection device, accomplishing the same result as the elaborate and 
complex procedures actually used. For this purpose tlie simulation does not 
have to be perfect as measured by the multiple correlation. Actual recom- 
mendation of the simulation composite in lieu of the complex selection 
procedure would assume no change in either the intended bases of selection 
or the selection ratio. If such changes are contemplated, the appropriate 
simulation composite and selection- ratio can be examined for the consequences* 

The level of the multiple correlation, and hence decjree of simulation, 
may be increased by introducing dichotomous predictor variables based on 
the level of the multiple cutoffs in the system, Dichotomous variables are 
also indicated where arbitrary metric wei^htinp. has been used for various 
levels of an ordered qualitative variate (e,E, military rank). Introduction 
of apparently extraneous factors may also increase simulation and provide 
further information on the selection process. 

The simulation composite may also be usefvd as a basis for ranre 
restriction corrections whore the multivariate methods would not be feasible. 
It would not be necessary" to assume that selection was confined to trunca- 
tions in a inultinormal applicant distribution* However, for this purpose, 
the validity of the simulated selection must be very hi^h as measured by 
the multiple (probably greater than •90)* This procedure tends to undor- 
correct rather than overcorrect. 

An initial tryout of the simulation method vias performed by 
I^, Valentine, using some OCS data available on applicants prescreened at 
5 on Officer Quality, The qualified applicant fjroup was then subjected 
to complex selection in accordance with rules that were operational at 
that time (but which have since been modified). No attempt was made to 
introduce some of the refinements in the simulation repression as sufrpested 
above, e.g., level dichotomies. The multiple with actual selection was ,70. 
Ref^ressed selection scores were computed on a random sample of 90 cases, 
stratified by selection-rejection so as to preserve the initial selection 
raiio. The regressed score distribution was cut at the selection ratio 
(.^2 selected) and the phi obtained between "predicted" and actual selection 



ERLC 



was .91. Of the 90 cases, two selectees were "predicted" as rejectees 
a:ui two rejectees "predicted" to be selectees, a total of four errors of 
classification in 90 decisions. 

Prepared by: 
John A, Creager, WWRDPS 

PUBLICAnOH REVIE.; 
This report has been reviewed and is approved. 



A, Carp, Technical Director 
Personnel Laboratory- 



Distribution i WADD 



ERIC 



Personnel Laboratory 
Wright Air Developmont Division 
Air Research and Devalopmant Conu.iand 
United Stat«3 Air Force 
lAckland Air Force Base, Texas 

Technical Mamorandmn * ^ 

WWHDP-rH-60-l|0 i>alection and Classification Branch 

27 September i960 Project 7717-87003 

ON TOE USE OF A OOIffOSlTE SIMUUTING CGI^PLEX SELECTION 
as a ^JSuaCs^™^^^ 

Xra^^sf wherXlection«^l3i^„' S???^J;.„^°-f-' 

teat acoff,ld a rating 0'^ pasr^o'j^oSiicar '''' * Pe-onality 

composite scores of certain candidates for varioSs exIr^Lfs Jact^rf 

th« wn^J!f ? unda, auch conditions nay be somewhat tenuous, even where 
the worker has great insight into the system and sUtisticai aonhlsti 

by cr^Lrr^n'JjUif f*""^ ^ ""P^"^ selection process be simulated 

S^un T?f<, w ^ regreasion system, generated on the full applicant 
^ T/a ^3 criterion a dichotomous variable if tKp 

J^s^WvXh^^r?/ predictors If this multiple correlation is 
SJen^ Ji™n«t n! II regression composite may be 

tSrm^^t f^fu'^^^P^''' selection. A high correlation indiSec 

«m!^vAi«if? h contributing variables of tlie actual system (or their 
equivalent) have been taken into account. Also, by examination of the 



XXX I \ i 



ERIC 



effective weights in the simulation conposlte, useful infornation n,,^ h 
obtained ragardine the rolo of- fMe various ealaction variS^^ ^ 

If the naltiplo correlation is lo-.f, either important aspects of th„ 
selection have not been taken into account and furLr investlSion of 
the baoes for selection are indicated; or, tho selection L not b^iL 
carried out in accordance with the explicitly stated rules. ^' 

If the distribution of scores on the simulated selection conooslte 

^'.'f ^"''^ selection ratio, the phi coefficient be?w5^^Sual 

?H '''1 ^ additionalln^Tof 

mn?H^i«°"* '"-^y ^^'"^^l ^ity even when the 

multiple correlation is appreciably less than 1. If there is perfect 

inim ^ «^ accounting of actual selection as measured by the phi 

coefficient, the simulation composite may be considered as a simpler 
selection device, accomplishing the same result as tho elaborate and 
complex procedures actually used. For this purpose tJ.e simulation does not 

S«HnS P°^f^^V%'"ft-^''^ -""^^^P^" correlation. Actual recom- 

mendation of the simulation composite in lieu of the complex selection 
procedure would assume no change in either tha intended bases of selection 
or .he selection ratio. If such changes are contemplated, the appropriate 
simulation composite and selection ratio can be examined for the consequences. 

The level of the multiple correlation, and henca dorree of simulation, 
may be increased by introducing dichotomous predictor variables based on 
the level of the multiple cutoffs in the syotcm. Di'.chotomous variables are 
also indicated where arbitrary metric weiphtinp. has been used for various 
levels of an ordered qualitative variate (e.g. military rank). Introduction 
of apparently extraneous factors 'may also increase simulation and provide 
further information on the selection process. 

Tho simulation composite may also bu useful as a b;\:;is for ranpe 
restriction corrections where the multivariate methods wo\ild not be feasible. 
It would not be necessary to assujiio that selection was confined to trunr.a- 
tloiia in a multinormal applicant distribution. However, for this purpono, 
the validity of the simulated selection must be very high as measured by 
the multiple (probably greater than .90). This procedure tends to undor- 
correct rather than overoorrect. 

An initial tryout of the simiilation method was performed by 
Mr, Valentine, using some OCS data available on applicants prescreened at 
5 on Officer Quality. The qunliXied applicant fjroup was then subjected 
to complex selection in accordance with rules that were operational at 
that time (but which have since been modified). No attempt wan made to 
introduce some of the refinements in the simulation regression as suggested 
above, e.g., level dichotomies. The multiple with actual solection was .70, 
Regressed selection scores were computed on a random sample of 90 cases, 
stratified by selection-rejection so as to preserve the initial selection 
ratio. The regressed score distribution was cut at the selection ratio 
i,$Z selected) and the phi obtained between "predicted" and actual selection 



ERIC 



XXX iv 



• PiSnsOIINBL RESa\RCH LAB0RA2X)Ry 
Air Farce Personnol and Training Research Center 
Air Research and Deve.lopm8nt Coinjnand 
Lackland Air Force Base, Texas 

LABOl^ATORY NOTE PRL-I^J^SS^-W^ 26 Anril 19^5 

7701-77023 ^ 



Correcting Correlation Coefficients for Selection 
Vtfhen the Nature ox the Selection is Unknovm 



John A. Cr eager 
Robert G. Siaith 



Probleni 

In Hiany prdctical problems sources of selection in addition to direct 
truncation render the Thomdiko correction forrailas inadequate for estimat- 
ing correlatiori coefficients for an urjrestricted population* This note 
presents a procedure designed to correct coefficients attenuated by selec- 
tion, v/ithout making the highly restrictive assui2?)tions usually more or lesa 
Violated in applyir^g the Thomdike forncilas. 



Assumptions 

The method presented in this note assuiues that: 

1. the unrestricted bivariate frequency distribution is nornal and 
hoctoscedastic for both variables, 

selection has resulted only in decreased, frequencies in certain 
cells of the bxvariate frequency distribution, 

3* the selection ratio is kno\m* 

The first assiunption ijj^xLies that the unrestricted regressions are 
linear. The second assumption rules out additions to the sanple due to 
transfers, 'holdovers, etc* 

The last assun^tion ideally refers to total per cent losses frcxst test- 
ing to criterion data collection i^egardless of source^ In practice the se- 
lection will gaaerailly include truncation, as qualified by cases admitted 
belov: the cut-off to fuir.iU quotas, administrative losses above the cut-off, 
early eiiBdnations, etc. ThU5 there is no restriction of selection to trun- 
cation of the tail of a distribution or assuirotion that the slope of one 
regression line be xxnaffected by the selection. 



^This paper is an informal note and is subject to modification or with- 
drax^ at any time. If referenced, it should be described as an 'Hiixpublished 
draft." 



XXXV 



^fathod 



A nomal bivariate scatter-piot for a correlation of ^294 on 1^000 
cases is presented In Table 1^ This wag aiijjected to truncation and several 
arbitrary losses throughout the mtrlx to yield the restricted scatter-piot 
in Table 2. ISiis is bordered by the uarginal frequency distributions in the 
restricted sajrole (N»550)^ In a practical problem one obtains this natrix 
and the per cent loss (45*0) • The problem is then to try to repxnxiuce the 
nomal bivariate frequency distribution from >feich the eelectiai san^xLe xas 
obtained. From the per cent loss and restricted sarjxle size it may be in- 
ferred that the unrestricted ^eispl^ size was 1,000» Tha mrginal frequen- 
cies may then be determinod from the areas under the norcial curve • In this 
exacqplo stanine distributions were used. The scatter-plot in Table 2 is 
then further bordered by a row ai>d column of discrepancies (d-nraulues) be- 
t>/een the rov (or colunn) sum and the unrestricted marginal frequencies* 



Table 1 

Honnal Bivariate Frequency Distribution 
(N « lOCX); r - .294) 



i 

8 

I 
k 

2 
1 

Ex a 65 120 175 199 175 120 65 Al 22 « 1001 
*Z2 ^ 66 121 174 19e 174 121 66 40 . 1000 

^Theoretical su^i used as basis of d-values. 

Table 2 

Ecstricted Bivariate Frequency Distributiai 
(N = 550j r - .186) 



1 


_2 




Ji 




_6 


j: 


j8 




0 


1 


2 


4 


7 


8 


- 8 


6 


5 


1 


2 


4 


3 


12 


lU 


11 


7 


6 


2 


4 


10 


IS 


24 


24 


19 


U 


8 


4 


6 


18 


29 


36 


34 


24 


14 


8 


7 


12 


24 


36 


41 


36 


24 


12 


7 


8 


14 


24 


34 


36 


29 


IS 


8 


4 


8 


n 


19 


24 


24 


18 


10 


4 


2 


6 


7 


11 


14 


12 


8 


4 


2 


1 


5 


6 


0 


8 


7 


4 


2 


1 


0 


41 


65 


120 


175 


199 


175 


120 


65 


41 


40 


66 


121 


174 


198 


174 


121 


66 


40 



2.. 

8 

I 
1 

I 

2 

1 

z 

d 

Check £ 40 



JL 


_2 


-2 


Ji 




J. 


JL 


j8 


-i 




i- 


Check S 


0 


0 


0 


0 


7 


7 


7 


5 


5 


31 


9 


40 


0 


0 


0 


5 


7 


12 


10 


5 


6 


45 


21 


66 


0 


0 


0 


15 


20 


24 


15 


10 


6 


90 


31 


121 


0 


0 


0 


10 


30 


30 


20 


10 


5 


105 


69 


174 


0 


0 


20 


35 


0 


20 


20 


10 


5 


no 


88 


198 


0 


0 


0 


0 


20 


25 


10 


7 


2 


64 


HO 


174 


0 


0 


0 


10 


20 


15 


10 


4 


0 


59 


62 


121 


0 


0 


0 


0 


10 


8 


4 


2 


1 


25 


41 


66 


0 


0 


0 


8 


7 


3 


2 


1 


0 


21 


19 


40 


0 


0 


20 


83 


121 


1A4 


93 


54 


30 


550 






40 


66 


101 


91 


77 


30 


23 


12 


10 




450 




40 


66 


121 


174 


198 


174 


la 


66 


40 






1000 



o 

ERIC 



Vxxvi 



.s.,.f ^ "orml distribution is diagonally symmetrical 

aoout the canter cell (xy), the frequencies in the restricted bivariate dis- 
t;r:.Dution are built up to yield a syTnt?.atrical distribution. For example, 
cell 3, 3; 3, 6j 4, 7; and 7, 7 should contain the same frequency. Ihe larg- 
est value, 15 (in cell 7, 7), is placed in all four cells. This is done for 
the whole matrix until the desired symmetry is obtained with a minimal addi- 
tion of cases. The" row and column sums, and the d-valuos are readjusted. 
The resulting matrix is shown in Table 3, 

Noting that each row and column of a bivariate distribution is uniiaodal 
one next proceeds to remove any inversions by increasing the frequency in the 

troublesome" cell to the lowest value in an adjacent coll. When this is 
done for the vjhole matrix, sj-maetry vd3JL be retained, irtversions in the mar- 
ginals vd.ll disappear, but vrill now appear among the d-vaiues. The resulting 
matrix for the exaiaple is sho\.r* in Table k with marginal and adjusted d-values. 



Table 3 



Synmetrized Bivariate Pi«eauency Distribuii on 
(r » .333) 



1 


2 


1 


4 


1 


6 


7 


8 


1 


E 


d 


Chock L 


1 0 


X 


Q 


2 


7 


7 


7 


6 


5 


35 


5 


40 


8 1 


2 


4 


7 


10 


12 


10 


K 


■ 6 


57 


9 


66 


? 0 


4 


10 


15 


20 


24 


15 


10 


7 


105 


16 


121 


Z 2 


t 


15 


20 


35 


30 


24 


12 


7 


152 


22 


174 


1 7 


10 


20 


35 


0 


35 


20 


10 


n 
1 


y^h 


54 


198 


7 


12 


24 


30 


35 


20 


15 


7 


2 


152 


22 


174 


1 7 


10 


15 


24 


20 


15 


10 


4 


0 


105 


16 


la 


2 6 


5 


10 


12 


10 


7 


4 


2 


1 


57 


9 


66 


I 5 


6 


7 


7 


7 


O 
*-» 


0 


1 


0 


35 


5 


40 


E 35 


57 


105 


152 


UU 


152 


105 


57 


35 842 






d 5 


9 


16 


22 


54 


22 


16 


9 


K 




156 




Check £ 40 


66 


121 


174 


198 


174 


121 


66 


40 




1000 



1 

3 

I 

I 
4 

3 
2 

1 



d 



Table 4 

Bivariate Frequency Distribution After Removal of 
Inversions in the Arrays ( r » .321) 



1 


2 


I 


k 




6 


7 


8 


1 


z 


d 


Check Z 


•0 


1 


1 


2 


7 


7 


7 


6 


5 


36 


4 


40 


1 


2 


4 


7 


10 


12 


10 


6 


6 


58 


8 


66 


1 


4 


10 


15 


24 


24 


15 


10 


7 


106 


15 


121 


2 


7 


15 


20 


35 


30 


21, 


12 


7 


152 


22 


174 


7 


10 


20 


35 


35 


35 


20 


10 


7 


187 


11 


198 


7 


12 


24 


30 


35 


20 


15 


7 


2 


152 


22 


174 


7 


10 


15 


.24 


24 


15 


10 


4 


1 


106 


. 15 


121 


6 


6 


10 


12 


10 


7 


4 


2 


1 


58 


8 


66 


5 


6 


7 


7 


7 


2 


1 


1 


0 


36 


4 


40 


36 


58 


106 


152 


187 


152 


106 


58 


36 


891 






4 


8 


15 


22 




22 


15 


8 


4 




109 




40 


66 


121 


174 


198 


174 


121 


66 


40 




1000 



xxxvii 



The Mxi step i« to reduce the d-values by a-ppl^ing "ccntlngency" 
correcti«i3 to the cell frsquencias. Tha correction for a given cell, C., 
IS d^dj where Id. ie the totrJ. of all d^viaticn values for the natrix, 

Where the correction is nearly halffjsy between tvfo integral values, the 
larger one i;5 taken and recorded v/ith a minue sign after it. The margirja 
and d-valuea are readjusted* The result of this operation is ahosai in 
Table 5* 

Table 5 



Bivariate Frequency Distribution After Application 
of Contingency Corrections (r « .2^!^) 



1 


2 


1 


k 


i 6 


1 


8 


2 


E 


d 


Check S 


1 0 


1 


2 


3 


7 8 


8 


6 


5 


40 


0 


40 


6 1 


3- 


5 


9- 


11- 14- 


11 


7 


6 


67 


~1 


66 


1 f 


5 


12 


18 


21 2? 


17 


U 


8 


121 


0 


121 


I 7 


9- 


18 


25 


37 35 


.27 


14 


8 


176 


-2 


174 


11- 


21 


37 


36 3? 


21 


11- 


7 


188 


10 


198 


4 8 


14 


27 


35 


37 25 


19 


9- 


3 


176 


-2 


174 


3 8 


U 


17 


27 


21 18 


12 


5 


2 


121 


0 


121 


2 6 


7 


11 


14 


li- 9- 


5 


3- 


1 


67 


-1 


66 


I 5 


6 


8 


8 


7 3 


2 


1 


0 


40 


0 


40 




6? 


121 


176 


183 176 


la 


67 


40 


996 






d 0 


-1 


0 


-2 


10 -2 


0 


-1 


0 


4 




Check £ 40 


66 


121 


174 


198 174 


121 


66 


40 






1000 



Snail adjustments are made reducing frequencies by 1 which have valuoa 
in a row tath a negative d-value. One starts \dth cells most removed from 
the regression Itoe until mir.us signs are removed or the d-value for the 
raw is no longer negative, xrhichever occurs first. Finally the contingency 
principle is reapplied using readjiiisted . d-values » The finally obtained cor- 
rected scatter'-plot is shown is*. Table 6^ Table 7 shows the discrepancies 
between Tables 1 and 6» 

Table 6 



Bivariate Frequency Distribution Corrected for 
Arbitrary Selection Losses (r « •293) 



i 


2 


1 


k 


1 


6 


7 


8 


1 


E 


d 


Check E 


9 0 


1 


2 


3 


7 


8 


8 


Z 


5 


40 


0 


40 


S 1 


2 


5 


8 


12 


24 


11 


7 


6 


••66 


0 


66 


7 2 
^ 3 


5 


12 


18 


21 


27 


17 


11 


8 


121 


0 


121 


8 


18 


25 


36 


35 


27 


14 


8 


174 


0 


174 


5 7 


12 


21 


36 


46 


36 


21 


12 


7 


198 


0 


198 


k 8 


14 


27 


35 


36 


25 


18 


8 


3 


174 


0 


174 




11 


17 


27 


21 


18 


12 


5 


2 


121 


0 


121 


2 6 
I 5 


7 


11 




12 


8 


5 


2 


1 


66 


0 


66 


6 


8 


8 


7 


3 


2 


1 


0 


40 


0 


40 


Z ItO 


66 


121 


174 


198 


174 


121 


66 


40 


1000 






d 0 


0 


0 


0 


0 


0 


0 


0 


0 




0 




Check L 40 


66 


121 


174 


198 


174 


121 


66 


40 






1000 



ERIC 



xxxviii 



Results 



The errors shown in Table 7 are quite small and concentrated for the 
most part near the regression line* The ^1 errors farther out may be at- 
tributed to rounding errors in Table 1. 

The corrected correlation coefficient coaputed from the scatter-plot 
in Table 6 is ,293 (as compared :ath ,294 in Table l). The uncorrected 
coefficient computed from Table 2 is ol26, which corrected by Thomdilce 
Formula 6 becomes ^243* 



1 

7 
■I 

I 
k 



Table 7 

Matrix of Discrepancies Betv.'een Unrestricted and 
Corrected Bivariate Frequency Distributions* 



i 


2 


1 


k 


i 


6 


1 


e 


1 




0 


0 - 


0 


-1 


0 


0 


0 


0 


0 


~1 


0 


0 


+1 


0 


0 


0 


0 


0 


0 


+1 


0 


+1 


+2 


0 


-3 


+3 


-2 


0 


0 


+1 


-1 


0 


0 




0 


+1 


+3 


0 


0 


-1 


0 


0 


-3 


0 


+5 


0 


-3 


0 


0 


-1 


0 


0 


+3 


+1 


0 




0 


0 


-1 


-1 


0 


0 


-2 


+3 




0 


+2 


+1 


0 


+1 


0 


0 


0 


0 


0 


0 


+1 


0 


0 


+1 


0 


0 


0 


0 


0 


-1 


0 


0 


0 


-1 



£E « ^1 



"^KJell values are corrected values frc»n Table 6 minus original 
values from Table 1. 



Conclusion 

This note presents a method for correcting correlation coefficients 
for selection. It is designed to have mere general applicability than 
the Thorndike forniu].as \;hich assume s.in:ple truncation (either direct or 
indirect) as the sole source of bias. The method presented was illustrated 
by an example involving an unrestricted correlation of about ^30 subjected 
to various kinds of losses. The exarpple v;as baaed on 1,000 cases for the 
unrestricted sample and 45 per cent loss by selection. Further investiga-- 
tion is required to ascertain the scope and limitations of the method, 
particularly as it is affected by small sajig)le size and extreme percentage 
losses. Further investigation is also required to determine the adapta- 
bility of the method for special problems frequently encountcrad in prac- 
tice. These problems include the cases vAicre the unrestricted margiiial 
distributions are not normal, the unrestricted scatter-plots are curvi- 
linear, or an erroneous estimate is made of the selection ratio. 



APPENDIX B 



MEMORANDA FOR MISOE 



ERIC 



March 3, 1972 



Dr, William G. Conroy, Jr. 
Division of Occupational Education 
1017 Main Street 

Winchester, Massachusetts 01890 
Dear Bill: 

This letter comments primarily on 0Ps#2 and 4. It is iiseful to 
approach the differentiations and instrumentation of the IPPI in terms of 
roles these elements play in the total HISQE. In the case of che input space, 
the student data are needed to characterize input to computer flow models, 
studying manpower issues, etc. They are also needed as control variables in 
analyses of product and impact outcomes from processes, and therefore, should 
include indicators of pre-process e?:periences and capabilities relevant to such 
outcomes • VJhile student data are needed as such for analyses where the stu- 
dent is die analytic unit, aggregate suiisnary data on the students entering 
particular programs, schools, etc*, are required where these are the analysis 
units. The distinction betv/een local, state, federal, and other capital 
data for cost-benefit analyses may require some arbitrary decisions, ex- 
plicitly stated and uniformly applied, when dollar input to a program comiiig 
directly from a LEA. indirectly comes from state funds, v^ich in turn may have 
come partially from federal sources* Thus, the identifiability of expendi- 
tures by these distinctions may trip over their lack of independence. Identi- 
fying by source chains may be helpful* 

At some point we should probably have a look at admissions require- 
ments and variations in such requirements across schools giving the ''same" 
programs. This may be more important in the post secondary programs. 

Variable selection and instrumentation in input space seem straight- 
forvyard except for ensuring equivalence of "scores" (e.g., IQ) from different 
instruments purporting to measure the same thing, and for ensuring acquisition 
of prior experience data information mentioned above. 

It is in the process space that manipulability and the feedback 
of results of decision making are most relevant. The present delineation of 
this space (0P#2, Fig. 2) as elaborated by in conference seems excellent, 
as is the explicit provision for obtaining cost data within this space 
(0P#2, P. 8), especially for the physical factors and for personnel. Also, I 
gather, that student perceptions of the process belong here under "perceptual" 
as an exception to tho human factors referring to nonstudent personnel. 

One basis for classification of physical factors (within eitlier 
structural or instructional types) would be on their joint occurrence across 
schools and programs. It should not be necessary to include in analyses of 
the products and impacts of process two physical factors which nearly always 
occur together; or, if the jointly occurring factors were ordered variables 
rather than qualitative conditions, would it be necessary to measure both. 



Dr. William G. Conroy, Jr. 
March 3, 1972 
Page Two 



I take it that the breakdown of an instructional event such as 
Fortran prograitraing course example into blocks and units yields exemples of 
organizational factors and should included information on sequencing. 

EW's delineation of organizational factors, a-d (0P#4, P. 3-4) 
should include a fifth factor: operating rules such as accessibility of 
physical equipment to the student. An alternative is to identify such 
"rules of organization" with decisional behaviors under human factors. The 
confusion arises because we are dealing with role incumbent decisions* about 
organizational factors. 

Obtaining both process and cost information for the orocess space 
depends very much on what is already documexitod at the local level, the degree 
of consistency in such documentation across schools giving similar programs, 
and the logistic flexibilities or constraints you may encounter in obtaining 
data on bases that ensure comparability across potential analysis groups. 

Student perceptions of process may ba picked up by a simple, obiect 
but confidential questionnaire focused, on the nature and amount of 
teacher contact, fast feedback of evaluations of the student's performance 
whether the atmosphere permitted the student to resolve perplexities, and ' 
E^'s suggestion about the student's feeling of some degree of control over 
the learning situation. 

El'J's discussion of decision making in the hierarchical arrangement 
of the process space with higher level decisions constraining decisions and 
other process factors at the lower level may have some special analysis im- 
plications. The presence or absence of such constraints can be indicated by 
dichotomous variables in regression. It may be that these constraints can 
be expressed in constraint equations in die case of linear programming models 
or as modified transition probabilities in flow models. ' 

The addressing system for process space infomvation appears reasonabl 
and even necessary to the functioning of the total system. The school, program 
and block subscripting arrangement is critical for. identifying analysis units. 
In the case of analyses where the students are the units of analysis, it is 
crucial that his data include the subscript in order to link the process 
variables to vrfiich ho is exposed. 

You expressed concern about summing capabilities within and across 
programs. If I interpret the concern correctly, it is an analytic rather 
than a product space differentiation problem. In my last letter, I commented 
on some of the pros and cons of weighted summaries versus configural approaches 
to combining outputs in both product and impact space as "dependent variables" 
in analysis. 

In your suggestion to expand Figure 4 into a 2-way t.-bls in terms of 
geographic space, you may be able to capitalize for the local, regional, and 
state on the notion of geopolitical stratification suggested in my last letter. 
In any case you will probably need geographic and occupational migration data 
in the later development of the impact space along these lines. 



Dr, William G, 
March 3, 1972 
Page Three 



Conroy, Jr« 



The instruraentation of the impacc space is difficult and it is here 
that I am hopeful that contacts with the DOD occupational analysis systems 
may be helpful for parts of the job, especially ir. the "Self*' portions. 
Supplementing with followup questionnaires tc subjects and their en:ployers 
should also be helpful in both self and society portions. 

Although I was thinking in terms of predicting impact space 
variables from process (and product) variables, your example of *'equal oppor- 
tunity" measures suggests the juggling of system parameters in simulation 
and comparison of different racial mi:xes within occupations that result with 
both the current actual mix and the "ideal" mixes defined by someone's value 
judgement. 

In the instrumentation task, it will be im:)ortap.t for analysis rhat 
instrument reliability be high and, in all catjes, author Icnowa or pld^^r.ibly 
est-^p^ble. Ivhlle a ^ood daal of useful siuii:iary descriptive infor.ration can 
be obtained efficiently with moJerately reliable instrisi.ients, measurement 
error of feeding back erroneous inferences into the systan may be cumulative. 
A forthcoming ACE report discusses these mactars and provides a useful list 
of references; evert thcugn our concerns are relevant to higher education,* the 
same prinscdples apply to occupational educational data. 

Another instrumentation issue is the possibility, where the same 
instrument is to be admini;;tered to large segments of the sample, of designing 
or adopting instruments that can be read on an optical scanner should be oon- 
sidcred. Given sufficient volume (N of 5,000 or so) a great deal of informa- 
tion can be obtained verj^ efficiently and result in data input tapes for the 
computerized aspects of the system. This applies to instruments with objective 
formats (check lists, multiple choice, etc., rather than open-ended or essay 
response). You will probably have this constraint anj'Xv-ay \diere decentralized 
administration and limited testing time are at issue. 

I plan to wite one mo3.e letter commenting on the computerized 
information system (Tasks 5-8, and C?^3). The requirement that the system 
be ongoing and expandable suggests that EvJ's subscripting codes for data units 
may have to be expanded to identify the fcine-cohoTt involved. This may not 
be necessary if all data are permanently stored by time-cohort on labeled 
tapes and input to temporary computer storage by special programs written in 
terms of the addressing systenu 



Sinceyj^ly, 




^ohn A. Cr eager 



JAC/mak 

cc: D. Tiedeman 
J# Kaufinan 



ERLC 



American Council on Education 

ONE DUr'ONT ciRcue 
WASHINGTON, D. C» 20036 



orncE or RcsGAf?cH February 28, 1972 



Dr. William G. Conroy, Jr. 
Division of Occupational Education 
1017 Main Street 

Winchester, Massachusetts 01890 
Dear Bill; 

This is to provide some substantive comment on 0P#1 and the sampling 
task differentiation section of 0P#2 in the light of our conference discussions 
and subsequent rereading. The design of the census level of information in- 
dicated in Figure 1 appears satisfactory for accomplishing its 3-fold purpose 
stated on page 2. 

In regard to the sample information system, lot me first explicitly 
distinguish (as you already have) the sampling design and logistics from the 
types of data to be collected on the sample. I would start sampling design by 
taking all the schools in the "universe*' and forming subuniverses by ''school 
types", treating each as a separate subsystem in MISOE development and for 
sampling purposes. The various types contain different numbers of schools and 
therefore provide different degrees of flexibility in developing samples* Re- 
garding the school types, the secondary school constitute the largest and most 
clearly defined group and will be used to comment on further sampling issues. 
The proprietary schools are probably so different from the public schools, e.g., 
in theVSase with which you will be able to obtain cooperation and possibly in 
some more substantive matters, that you may want to treat this a.s a separate 
school type; a small number of such schools v;ill either preclude doing this or 
will limit the degree to which finer subsampling of programs and students may be 
achieved. Perhaps, to a lesser degree, conmmiiity colleges and "schools" with 
adult or MDTA occupational educational programs as "school types" will be sub- 
ject to similar considerations. Hero it is desirable to have counts for the 
whole state to aid in judging feasibility in delineating 'more detailed sampling 
plans « 

In the secondary school sector with some 1800 schools as the universe 
base, I would sort those into, say, four geopolitical groups, e.g., metropolitan 
Boston area, eastern "rural", western cities and towns, and western "rural". 
From your knowledge of the population density distribution and of the geographic 
distribution of secondary schools some reasonable definition of these categories 
should be possible. There is nothing sacred about either the number or labels 
on these categories; however, an increase in the number will provide and create 
subsequent problems. 

Within the geopolitical categories, a decision is needed as to 
whether to sample LEA's or individual schools; at least in the larger conmiunities, 
the LEA may be the central agency covering two or more secondary schools. The 
O advantages of sampling LEA's and including all schools under a sampled LEA arc: 

ERIC 




Dr. William G. Conroy, Jr. . 
February 28, 1972 \ 
Page Two 

1. direct meeting of^he requirement that any LFA can be identified 
with its geopolitical category ^ 

2. logistic convenience and possibly lower costs of data collection 

3. a built in tendency to sample students in accordance with population 
density patterns. 

The disadvantage is that each school in the state does not have an equal oppor- 
tunity to be represented in the sample, but proper weighting of data in estimating 
population totals can allow for disproportionate, random ^inmpliug within cells of 
the sampling design. 

Insofar as the LEA's may cover more than one school type, you may want 
to take logistic adve^tage of that fact and coordinate the sampling of the other 
school types with that of the secondary schools. This notion implies nonrandom- 
ness in the sampling of the school types unless they exist in sufficient numbers 
such that the set of schools in the sampling cells can be subdivided into those 
so coordinated with the secondary school sampling and those which are not. This 
would still not please a pure mathematical statistician but may be worth con- 
sidering if the '^counts" are favorable and logistic convenience is a strong 
trade-off point. 

Taking all secondary schools within sampled LEA^s (except taking a 
maximum of three if any have more than three), one could take all programs as the 
next sampling level and all students within programs up to a maximum by grade 
level. To ascertain the feasibility of this approach and to determine%diat 
modifications are required to meet cost and logistic constraints, all readily 
retrievable data on the enrollments in all kinds of programs in secondary 
schools should be examined, as well as their ^'geopolitical" distribution. One 
would also need to check the U-}i-0 picture for each program to ensure that the 
kind of summaries in terms of inter-school similarities shown in your product 
data example will be possible. One would also want to ensure that the rare 
programs or those with unique objectives were represented in the sample. Your 
census data plan should provide the data necessary for sample planning, but 
preliminary counts, even guesstimates may have to be used, if you must draw 
samples before the first census implementation and with the idea of later ad- 
justments to the sampling. If the latter contingency can be avoided, fine. 

In conference I raised the question of multiple samples possibly 
overlapping so that no LEA caries a full and continuing burden of data 
collection and reporting, and so that a lost LEA (from some logistic goof or 
refusal to cooperate) can be readily replaced. To this I add two thoughts: 
resampling every nth year to take advantage of system changes shown by your 
census data and the possibility of a modified (simplified?) hierarchy of 
stratification for expenditure, data. To ensure a tight linkage between expen- 
diture and other data, information should be obtained from the same ultimate 
sampling units, so I am having second thoughts about separate samples for de- 
tailed expenditure data. Also tis indicated in conference, comparisons of LEA 
information on inputs, processes, and products, on the sample may be sumimarized 
within stratification cells, without weighting the data, but comparisons with 
data summarized across cells, or aggregated at the state level for estimates of 



Dr, William G, Conroy, Jr, 
February 28, 1972 
P-age Three 



census paraineters in various subsystems will require data we. -ting under such 
procedures as outlined above, ^ ^ unaer sucti 

sample-population relationships provide no serious Problm if^iling"ithi 

le censur^ntL " '''"""''^ ^^^'^ computed f^om 

the census data, as appears to bo tl,e case. The adequacy of weighting any data 
not obtaxned xn the census will depend on the correlatio^ between those sLpJe 
data xcems and the census data items in which weights are based (andTof^uJsa 

--^^ - P°:;oLSt::us! 

nlvlnn + <Jis'^"ssing "camera effects", it is my judgement: that you may be 
giving too much weight to this possibility at the expense of your other con- 
sxderatxon of 3.mplications for analysis-. Per -sampling purposes each yea^s 

ZZ Z'^rlt '''' P"!-^^'^^- Sivinrris; t^ : Lhor 

Sov^i . '^^'"^'^""ds analysis, given adequate annual cohort sampling real 
sZlTf iTvZ r ---t^ers should be estimable from we^gl'ted 

sampleo.. If you select new samples between the ead of process and product and 
dSrirf ' -^i«>P^'^\' there win be no basis for matching da?a on 

piSr Yorwi?r'if '^^^^'^ -^-y --^orJnt longiLd nal 

of course, be able. to make descriptive summaries within 

?hLr Tn^o^^a^r i ^jf^-" ^in^^ of c.oss-sectic nal 

followup of a^o^rt sar^lTlTrel^^Z.^^.A^^^^^^^ 

SeL^ma^tters"^ '^m^f brti^^rtL^^ti'^'^f^^^^^^^^^ ^^^se in 

t- . i.n-,L uiie cescing erlects are more serious w-fi-h t-ht^ 

SLTwith'"'v°''" f'^'"^= considering than^thosr^rhafc e n 

^Snnn/i • "'"^t al«o consider whether you have enougli units within the 

samplxng desxgn to proliferate additional samples ad Infinitum . 

bi^lc TTPT^nf^™'' r"."" ^"'^"''^ ''^"'^'^ °" 0^*1 regarding data types. The 

basic IIPl element structure is "right on". In my next comn'unlcation i nlan 
to comment on space differentiations more fully. In the process space I would 
certainly encourage the idea of picking up sequencing nf^Sgn wh'rlver yoi ' 
ensure thiril^ r" Soing so far as to"^ 

impoxtan factor in lllT'^'f f'' ^'^'^ '^yPothesis that this is an 

xmpoxtant factor in attainment o:; objectives Is an important one. 

for ,,rodaJT^'^"''''•''" °^i^'^"'^"yi"8 Process elements of a program accounting 
lZ.ll tl. ? v^^^ance is one calling for the regression model; ditto for 
Se fnn.? ^'^^ ^ concerned about matching capabiUty in 

the longxtudlnal data, m order to deal with prediction of configurations of 
objectxves some decision will be required about Iiow they areirF f wolgl°ted 

as eq'urdesira'b?'''"^ u''-' "''^^'^ ^^^^^^^^ thus treat ng all 

vou beUeL J^'r ' ''^"^ ^ ^n'Portaut, and having costs and benefits (don't 
you belxeve It.) - or one can assign weights to objectives that allow variations 
iou d'^::sS weights Zll t:he classical canonical regressio^ ^LIl ^h ci? 

v,ould assign weights to maximize predictability of the resulting configuration 



Dr. Willinm G. Conroy, Jr. 
February 28, 1972 
Page Four 



and such wcxghts are not necessarily the most relevant. For this reason, I see 
no hurry about navmg a canonical regression capability in your computer software 
unless discriminant analyses are anticipafod. The above line of reasoning 
assumes, that once weights are assigned to objectives, the criterion collapses 
into a single composite variable. TIas may not be the best way to operate on 
predicting configurations, but it occurs to me that once MISOE is operational 
some of the cost-benefit and impact infonnation can be fsdback to provide im-' 
preved weighting schemes for defining configurations. Another thought is to 
group configurations and use discriminant functions to predict which subjects 
are most likely to belong to which class of configurations. Considering that a 
program with iO objectives, achieved or not, would have 2l0 configurations, this 
approach seems at first sight to be a formidable one. But in some ways, I find 
a greater intuitive appeal to discriminating configural attainment than pre- 
dicting some v/eighted average of configurations. 

. • ^ V" experiments with a process change in a certain program, a 
comparison of multiple regressions of product variables on old and new sets of 
K^^Jn^r^h "^^.i^^^ data on the changes in percentages of students 

attaining objectives will give a provisional answer. However, the data may be 
IZ'tcLt school not in the foi-mal sample, being picked up in the census data, 
and because it is found m a single school, be available on a small sample. It 
would need to be checked (cross-validated) on another group going throuoh the 
d'a^ng".'"' '^'^ '° ^^'^ " "'^'^^^ sfhooL'^re'trying'^he same 



aho..i- -ron-i T.l. to repeat and expana a little on a remark I made in conference 
about input data. In order to control predictions of product and impact data 
from process variables for differential input, you will need to include some 
?r?f nn^ f P°-""" experiences affecting performance on the objectives. 

n^i^^• P^^''^-*^^^' P^'^l^^PS in>possible to pretest all sample students on all 
objectives, but some, even crude elicitation of prior work after school or on 
Ihnu?H r.';; (for autu mechanics) or in a beauty shop (for cosmat-ologists) 

•should be devised and obtained. It is conceivable that such experiences may be 
going on concurrently with the program process, tlras contaminating the process 
space effects Ihis may not be undesirable for achievement of the objectives, 
but should not be treated as a process effect unless it is aided, abetted, and 

n^^5^^'V :i f'^l: °^ '•'^ P'^"""- J-osistically, information on this 

needs to be obtained during or at the end of process, remembering to treat the 
. data in analysis as control variables, not properly part of either process or 
product space. 



ERIC 



The issue you raise (CP#1, page 10) about differential treatments for 
diffr.-ent groups of students is one under continuing discussion in the methodo- 
r-:. '"f"''''' involving heterogeneity of regression, moderator variables, 
etc. Either Dave or I can give you many references. However, what appears to 
bo a rocenu breakthrough has been developed by Don Rock and his associates at 
Educational Testing Service in Princeton, New Jersey, and is reported in the 
current issue of the American Educatio na l Research Jou rnal. It involves a 

in regression, hierarchical grouping, and 
discriminant analysis. Regarding the outcome probability tables you mentioned 
I enclose my "dream" paper. Some agencies (e.g., American College Testing) are 
actually doing this kind of thing, perhaps prematurely. ^escmg; 



Dr. William G. Coiuoy, Jr* 
February 28, 1972 
Page Five 



ERIC 



. , } ^° '^^"^ ° expertise on ways of obtaining cost data. I should 

not think it necessary to get detailed cost data on the census basis, but caouah 
gross data to permit weighting more detailed sample data. I suspect too, that 
working with clusters of objectives and meeting other difficulties discussed may 
require some application of hierarchical grouping of programs and objectives. 

v.M „, T""-" ^'f.} ^""^ considerable experience with the logistic 

problem of obtaining foUowup data after program completion. I agree with you- 
idea of getting general information on an actuarial basis and supplementing " 
this with greater depth "clinical" information on a small group. I would 
however, give lower priority to the latter. In regard to the sampling plan for 
followups, I suggest: ^ ^ 

hnf followup all san.ple subjects coming out of small -enrollment programs. 

rll. I ?? '^"^^"^ °^ '^^^•'''^ "^"S out of larger progrLsf ' 

problm^, ^^"'''^ "'^'^^'^ °" '^"^ ^°^^°""P no serious 

dents rul'fJ'rt PfT""^ to perform at least'one wave of followup of nonrespon- 
dents The facL that you will have extensive input data on tbe subjects will 
llZltt ^"^"""^'^^^^ °" '^^-f nonrespondcnts and a basis for compu.-.ing adjusting 
weights for nonresponse bias in longitudinal data. Our experience has been 

11.. ' .^'^n^^ "^^"^ ^° "•''P°"^ "'^il followups, whites 

he ls inl r' ^'^^Shts" than -dulls". I understand from John piaLgan that 

S^ve win bf.M ^ empirical studies bearing on them. 

Dave will be able to give you niore on this. 

ho ^ criicial if you plan to foUowup by mall, that viable addresses 

be obtained and maintained. We have found the student's home address ver. use- 
ful since his family often forward mail to him. This information should be 
address m. r"' ^"^^ '""'P^^ maintained as a confidential name and 

bias . nf 1 t r ' ^""-^ literature from our shop on mallout 

bias control techniques and on confidentiality issues. "laiiout 

Hnn .1 ^^^tle Comment on the analytical data typos beyond an apprecia- 

Jh^?^ !^ ^ ^i^"- "^'^ ^''''■^ "SOE DOT -^odes and to note 

that most of the issues raised should be tractable when the IPPI space differer- 

wJvs nr.!^fi.^"\'^"^''f irnplemented. i look for enlightenment on* 

ways of establishing dollar equivalence of non-economic outcome variables Jack 
has mentioned some general principles which if properly applied seJ. cri ica^ 

and"n%H' 'I^T' '''' ^''^^^^^"^ both Project 

cn^fhon^t^ , Program. In neither case are they as tightly linked to 

cost-benefit data as we would like. 

_ I promised Elizabeth Weinberger some army and navy contacts, parallel 
;Mfi;;i'5 ^- gave her about occupational analysis, thinking that 

.-fJ^^ T^."''^ °^ information in her instrumentation problem., and 
so it might still be. I learned that there is much larger effort going on in DOD 
with mterservice coordinations which involves computerized occupational data 
systems for management, training and manpower. I think you may wish to explore 
what they are doing and decide what aspects are most useful to you and your 
statf. I gather that the coordination is from the office of General Piatt. 
Assistant Secretary for Manpower and Reserve Affairs, Director of Utilisation 



Dr. William G. Conroy, Jr. 
February 28, 1972 
Page Six 



in the Pentagon, and that the best contact at the level is a civilian, Mr. 
Robert Groover, in charge of the Occupational Information Service Center** 
Phone (202) -697-8244. The Dr. Raymond Christal whose name and address I'gave 
to Elizabeth and his colleague Dr. Robert A. Bottenberg at Lackland AFB developed 
a computer occupational data analysis program (CODAP) which is used in the Air 
Force and I believe, by the Marines. Christal and Bottenberg each gave papers 
at a NATO conference held last year in Cambridge and I believe getting copies 
of their papers may be useful to you. 

The army counterpart of Dr. Christal turns oat to be Dr. Cecil Johnson 
m the Behavior and Systems Research Laboratory, located in the Commonwealth 
Building at 1300 Wilson Blvd. in Rosslyn, Virginia. I understand that the 202 
area code can be used for any DOD branches in the area, even those located in 
Virginia. If you have any difficulty, call DOD central operator, 202-545-6700. 

Mr. Groover just returned ny call and gave me the names and phone 
numbers of the other service branches key persons: 

Navy: Commander Bruce Cormack of the Canadian Armed Forces 

on a tour of duty with the USN, has two offices. One at 
Boiling AFB, phone: aX-3-27i2. The other is in the 
Personnel Research Division of BuKavers which is in the 
process of moving this weekend to the Arlington naval annex 
on Columbia Pike, the new number being OX-4-5626. I under- 
stand that a Dr. Ballard in that unit is also knowledgeable 
about the naval activity in this area. 

USMC: Col. George Caradakis, Company D, Hdq. Battalion, Marine 
Corps Base, Quantico, Virginia (703-640-2890?). 

U.S. Coast Guard: Mr. Joe Cowan, (202) -426-0891, in the Psychological 
Research Branch (P-1), U.S. Coast Guard Headquarters, 400 
Seventh Street, S.W., Washington, D.C. 20590. 

I also promised to send Martin Breslow some information about our 
statistical computer package. On discussion with our data processing chief, I 
learned that some difficult legal and other hassles would develop if we were 
to try to give you the package itself, and we are out of our (outdated) manual. 
In lieu of this, you should contact David Armo'ir at the Harvard Computer Center 
for information about the Fortran version of Data Text which he is developing 
and, is about ready for use. A Data Text Priner is available from him I 
understand, for $5.00, and probably is the bei^t thing to start with, before deciding 
whether to negotiate for a copy of the package or to develop a modest version in- 
house. Our system is an adaptation of an older version of Data Text and x«s S 
ratlier costly to adapt and convert. 

Sincerely, 




(Jojin A. Creager 

O ^Icsearch Associate 

ERIC Enclosure 

cc: Jacob J. Kaufman 



